dlangBugzillaToGithub / migration_test

0 stars 0 forks source link

"real" appender missing #786

Open dlangBugzillaToGithub opened 11 years ago

dlangBugzillaToGithub commented 11 years ago

verylonglogin.reg (denis-sh) reported this on 2013-09-29T03:23:28Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=11138

CC List

Description

Lets call "real" an appender which will create the one and only array, i.e. it will not randomly create a copy of array on appending. Such copy is bad as:
1. Almost no use.
    In 99% appender is used as "real" appender and the copy is not needed.
2. Performance.
    It will cause calls of postblits and then destructors on collection of copy array.
3. Needless restriction on element type.
    One may want to create and array of uncopyable type which non-"real" appender can't do by definition.

Obviously, "real" appender is uncopyable and don't give safe access to its array until it finished.
dlangBugzillaToGithub commented 11 years ago

monarchdodra commented on 2013-09-29T04:03:25Z

(In reply to comment #0)
> 3. Needless restriction on element type.
>     One may want to create and array of uncopyable type which non-"real"
> appender can't do by definition.

Appender supports immutable construction (mostly bug free, AFAIK). I don't think a "real" appender would do any better than appender could anyways, so this is a non-argument (IMO).

> 2. Performance.
>     It will cause calls of postblits and then destructors on collection of copy
> array.

Technically, the destructors don't need to be called, since the GC will leak them anyways :/ See below about "postblits"

> Lets call "real" an appender which will create the one and only array, i.e. it
> will not randomly create a copy of array on appending. Such copy is bad as:
> 1. Almost no use.
>     In 99% appender is used as "real" appender and the copy is not needed.

I've thought about this before. The *only* case this would really change anything, is on structs that have an elaborate CC. Arguably, these things have no business being in a GC managed array, since it never finalizes its elements (and arguably, if you have a CC, you have a destructor). If we ever get a finalizing GC, then things will be different.

One "workaround" I've though about is that when the type has a CC, Appender could carry an extra "ownsArray" boolean. If appender must relocate its array, and "owns" the array, then postblit is not necessary. In the future, if the GC is finalizing, it may also have to re-initialize to T.init, if there is an elaborate destructor.

> Obviously, "real" appender is uncopyable

Appender having ref semantics, there would be absolutely no problem "copying" it (read: pass by value).

> and don't give safe access to its
> array until it finished.

I can see the motivation behind this "real" appender. Another (dis)advantage it has is that it wouldn't be initialize-able from an existing array. This means it would have 100% ownership of its internals, but on the other hand, it also means, it also means it can't extend an exiting array :/

So, "at the end of the day", I'm not really sure a "real" appender would pull its weight compared to the standard appender (IMO).
dlangBugzillaToGithub commented 11 years ago

verylonglogin.reg commented on 2013-09-29T08:12:23Z

(In reply to comment #1)
> (In reply to comment #0)
> > 3. Needless restriction on element type.
> >     ...
> 
> Appender supports immutable construction (mostly bug free, AFAIK). I don't
> think a "real" appender would do any better than appender could anyways, so
> this is a non-argument (IMO).

Mutability have nothing to do with copyability. Struct with disabled postblits are meant here. And current appender design can't accept them.

> > 2. Performance.
> >     It will cause calls of postblits and then destructors on collection of copy
> > array.
> 
> Technically, the destructors don't need to be called, since the GC will leak
> them anyways :/ See below about "postblits"
> 
> > Lets call "real" an appender which will create the one and only array, i.e. it
> > will not randomly create a copy of array on appending. Such copy is bad as:
> > 1. Almost no use.
> >     In 99% appender is used as "real" appender and the copy is not needed.
> 
> I've thought about this before. The *only* case this would really change
> anything, is on structs that have an elaborate CC. Arguably, these things have
> no business being in a GC managed array, since it never finalizes its elements
> (and arguably, if you have a CC, you have a destructor). If we ever get a
> finalizing GC, then things will be different.

So, you are talking that GC bug must force us to do bad design? I will accept it only for an iron-WONTFIXed one.

> One "workaround" I've though about is that when the type has a CC, Appender
> could carry an extra "ownsArray" boolean. If appender must relocate its array,
> and "owns" the array, then postblit is not necessary. In the future, if the GC
> is finalizing, it may also have to re-initialize to T.init, if there is an
> elaborate destructor.

Such flag will not work as Appender API allows safe access to the array so the array can never be owned.

> > Obviously, "real" appender is uncopyable
> 
> Appender having ref semantics, there would be absolutely no problem "copying"
> it (read: pass by value).

Yes. And such ref semantic is not generally needed and just results in needless GC allocation in stuff aimed for performance-critical tasks.

> I can see the motivation behind this "real" appender. Another (dis)advantage it
> has is that it wouldn't be initialize-able from an existing array. This means
> it would have 100% ownership of its internals, but on the other hand, it also
> means, it also means it can't extend an exiting array :/

It of course can't initialize from existing array if element isn't copyable. Otherwise it will create its own copy of array. Where is the problem? Also we talk here about very rare usage of the appender which I'd rather ignore just like the case when its content is needed in the middle of appending as non-general.

> So, "at the end of the day", I'm not really sure a "real" appender would pull
> its weight compared to the standard appender (IMO).

Exactly what I expect for each my proposal. :) So as always I will just create one for my own use.
dlangBugzillaToGithub commented 11 years ago

monarchdodra commented on 2013-09-29T09:41:29Z

(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > 3. Needless restriction on element type.
> > >     ...
> > 
> > Appender supports immutable construction (mostly bug free, AFAIK). I don't
> > think a "real" appender would do any better than appender could anyways, so
> > this is a non-argument (IMO).
> 
> Mutability have nothing to do with copyability. Struct with disabled postblits
> are meant here. And current appender design can't accept them.

Right. Sorry.

Wait. How do you append something that can't be copied? Is this even relevant?

> > I've thought about this before. The *only* case this would really change
> > anything, is on structs that have an elaborate CC. Arguably, these things have
> > no business being in a GC managed array, since it never finalizes its elements
> > (and arguably, if you have a CC, you have a destructor). If we ever get a
> > finalizing GC, then things will be different.
> 
> So, you are talking that GC bug must force us to do bad design? I will accept
> it only for an iron-WONTFIXed one.

You have a point.

> > One "workaround" I've though about is that when the type has a CC, Appender
> > could carry an extra "ownsArray" boolean. If appender must relocate its array,
> > and "owns" the array, then postblit is not necessary. In the future, if the GC
> > is finalizing, it may also have to re-initialize to T.init, if there is an
> > elaborate destructor.
> 
> Such flag will not work as Appender API allows safe access to the array so the
> array can never be owned.

Not really, the appender API is rich enough for appender to "know" if anybody else has yet accessed the array, either through arr, or via construction. If no one else has a handle on the array, and appender is about to relocate, then it can pretty much do anything it wants.

Maybe the word "owns" is inadequate. The bool would be called "arrayHasExternalView".

> > > Obviously, "real" appender is uncopyable
> > 
> > Appender having ref semantics, there would be absolutely no problem "copying"
> > it (read: pass by value).
> 
> Yes. And such ref semantic is not generally needed and just results in needless
> GC allocation in stuff aimed for performance-critical tasks.

I think appender was mostly to offset how ridiculously slow ~= is. I'm not sure it wasn't design with *absolute* performance in mind.

BTW, you might want to know about:
http://d.puremagic.com/issues/show_bug.cgi?id=10864

Off topic: I'd like to try to make a push for some templatization in druntime. I thought you might be interested in the project. I'll keep you posted.

> > I can see the motivation behind this "real" appender. Another (dis)advantage it
> > has is that it wouldn't be initialize-able from an existing array. This means
> > it would have 100% ownership of its internals, but on the other hand, it also
> > means, it also means it can't extend an exiting array :/
> 
> It of course can't initialize from existing array if element isn't copyable.
> Otherwise it will create its own copy of array. Where is the problem?

You are creating a copy. Appender can be used to append to an existing array, whith lots of capacity, without relocating it. This can be very useful for say, processing lines in IO, where you are re-using the same buffer for each line.

> Also we
> talk here about very rare usage of the appender which I'd rather ignore just
> like the case when its content is needed in the middle of appending as
> non-general.

Well very rare remains to be proven. Still, I don't think it's that big of a problem if a "real" appender simply duplicated before allocating: To each their own behavior. You can't get "super speed" if you aren't willing to give appended total ownership of the passed data.

> > So, "at the end of the day", I'm not really sure a "real" appender would pull
> > its weight compared to the standard appender (IMO).
> 
> Exactly what I expect for each my proposal. :) So as always I will just create
> one for my own use.
dlangBugzillaToGithub commented 11 years ago

dmitry.olsh commented on 2013-10-20T03:26:03Z

(In reply to comment #1)
> (In reply to comment #0)
> > 3. Needless restriction on element type.
> >     One may want to create and array of uncopyable type which non-"real"
> > appender can't do by definition.
> 
> Appender supports immutable construction (mostly bug free, AFAIK). I don't
> think a "real" appender would do any better than appender could anyways, so
> this is a non-argument (IMO).
> 

Construct in place. See also C++11 emplace_back and friends. Even these guys see the benefit.

> > 2. Performance.
> >     It will cause calls of postblits and then destructors on collection of copy
> > array.
> 
> Technically, the destructors don't need to be called, since the GC will leak
> them anyways :/ See below about "postblits"

Who told you that a long-standing bug allows us to basically do even worse job then we have to ? ;)

> > Lets call "real" an appender which will create the one and only array, i.e. it
> > will not randomly create a copy of array on appending. Such copy is bad as:
> > 1. Almost no use.
> >     In 99% appender is used as "real" appender and the copy is not needed.
> 
> I've thought about this before. The *only* case this would really change
> anything, is on structs that have an elaborate CC. Arguably, these things have
> no business being in a GC managed array, since it never finalizes its elements
> (and arguably, if you have a CC, you have a destructor). If we ever get a
> finalizing GC, then things will be different.
> 
> One "workaround" I've though about is that when the type has a CC, Appender
> could carry an extra "ownsArray" boolean. If appender must relocate its array,
> and "owns" the array, then postblit is not necessary. In the future, if the GC
> is finalizing, it may also have to re-initialize to T.init, if there is an
> elaborate destructor.

Sadly Appender leaks refernces to its internal array so it's can own the data at all.

> > Obviously, "real" appender is uncopyable
> 
> Appender having ref semantics, there would be absolutely no problem "copying"
> it (read: pass by value).

IMHO it could just be uncopyable type and RefCounted!T should solve the issue of passing it as reference anywhere. 

> > and don't give safe access to its
> > array until it finished.
> 
> I can see the motivation behind this "real" appender. Another (dis)advantage it
> has is that it wouldn't be initialize-able from an existing array. This means
> it would have 100% ownership of its internals, but on the other hand, it also
> means, it also means it can't extend an exiting array :/

You miss a big use case. Frankly patching Appender on top of an array is a minor thing that buys us a tiny bit over assumeSafeAppend.

Doing a lot of reallocations in the process of building and not owning data is what makes Appender very poor at constructing ... array-like containers.

> So, "at the end of the day", I'm not really sure a "real" appender would pull
> its weight compared to the standard appender (IMO).

Then call it a Builder! Such a builder has a chance at good API design unlike the current abomination. In particular it could accumulate data in chunks and never reallocate until the very end. Secondly while assembling the final array/container it may move data freely both supporting non-copyable stuff _and_ being more efficient.
dlangBugzillaToGithub commented 11 years ago

dmitry.olsh commented on 2013-10-20T03:32:42Z

(In reply to comment #0)
> Lets call "real" an appender which will create the one and only array, i.e. it
> will not randomly create a copy of array on appending. Such copy is bad as:
> 1. Almost no use.
>     In 99% appender is used as "real" appender and the copy is not needed.
> 2. Performance.
>     It will cause calls of postblits and then destructors on collection of copy
> array.
> 3. Needless restriction on element type.
>     One may want to create and array of uncopyable type which non-"real"
> appender can't do by definition.

4. Support building stuff other then built-in array on GC heap.

5. Deterministic memory management. There is no point in keeping discarded memory flaoting around after the final array is produced.

6. (sky is the limit) Use the allocators if such an ephemeral thing ever comes to materialize.

> Obviously, "real" appender is uncopyable and don't give safe access to its
> array until it finished.

Honestly the thing current Appender is used for is a Builder. 
As in StringBuilder of C#/Java minus the madness of extra methods. 

With rebranding it as Builder and few extra points I'm all for it :)
dlangBugzillaToGithub commented 11 years ago

verylonglogin.reg commented on 2013-10-20T04:45:49Z

(In reply to comment #5)
> With rebranding it as Builder and few extra points I'm all for it :)

And you will get it. Thanks for the positive response. As always I will create it in my Unstandard library first and make an inclusion proposal. But not sure I will do it fast.

Everybody if free to reassign the issue to him, I will provide all my related code and ideas.
dlangBugzillaToGithub commented 10 years ago

dlang-bugzilla commented on 2014-01-11T06:47:22Z

For the record, IIRC from my appender performance experiments I found that extra indirection (required for a copyable ref-counted appender) has a big impact on performance. Most uses of an appender do not involve passing it around by value.

Andrej linked to this bug from https://github.com/D-Programming-Language/phobos/pull/502, more discussion there and at Issue 5813.