eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
396 stars 62 forks source link

building a sequence with a comprehension is wasteful #1337

Open CeylonMigrationBot opened 10 years ago

CeylonMigrationBot commented 10 years ago

[@gavinking] This one has got to be an easy fix.

IntArray intArray = IntArray(10);
Integer[] sequence = intArray.array.sequence;

Results in an ArraySequence backed by an Object[] array of size 15. That's crazy.

UPDATE: in fact this is a problem with how we hande this syntax:

[ for (x in xs) f(x) ]

[Migrated from ceylon/ceylon-compiler#1337]

CeylonMigrationBot commented 10 years ago

[@gavinking] Mybad, the problem is in Array.sequence, I suppose.

CeylonMigrationBot commented 10 years ago

[@gavinking] So in fact the issue here seems to be with how we build sequences from comprehensions. I have worked around it in Array by using a custom implementation of getSequence(). But indeed the syntax [ for (x in xs) f(x) ] should be much better at estimating the size of the resulting ArraySequence.

CeylonMigrationBot commented 10 years ago

[@tombentley] Just to be clear about what this is about, currently a comprehension argument to a tuple enumeration results in us creating an AbstractIterable according to the comprehension, and then calling its AbstractIterable.getSequence() method, which uses a SequenceBuilder using the default initial capacity (currently 5), which then has to be resized. This over-allocates the size of the backing array and it is not subsequently trimmed back to the perfect size.

In general we can't just use xs.size as the initial capacity because for an arbitrary Iterable xs that would cause an iteration over all the elements, just to compute the size (then we need another iteration to fill the sequence). Until we implement #5076 we can't even be sure that the Sequential.size is efficiently computable.

For now we can change AbstractIterable.getSequence() to test for things like ArraySequence, Range, Tuple and Array etc and only then use the size as the initial capacity of the SequenceBuilder. Obviously that doesn't benefit other implementors of Iterable whose size is efficiently computable. Aside from using some kind of marker interface I'm not sure how the runtime could know.

CeylonMigrationBot commented 10 years ago

[@gavinking] @tombentley Well, certainly all Sequentials are finite, which covers almost all the cases you just listed. But indeed, I think the idea is that actually all Collections are finite.

CeylonMigrationBot commented 10 years ago

[@FroMage] I hate when we spend time on optimisations before 1.1

CeylonMigrationBot commented 10 years ago

[@tombentley] This is actually trivial once #1351 is fixed, at least for a comprehension with only for clauses. What should we do about comprehensions featuring if clauses though?

CeylonMigrationBot commented 10 years ago

[@gavinking]

should we use the size estimate anyway, even though the conditions may mean the sequence is ultimately smaller than that

Yes, I guess so, but we should limit it to a reasonable number.

not excluding the above options, should we trim the SequenceBuilders array before we obtain the Sequence?

This is a tough one. Not certain.

CeylonMigrationBot commented 10 years ago

[@tombentley] This is now fixed. I used an upper limit of 128 in there were if comprehensions present. That may be a little large. I didn't do any trimming because SequenceBuilder doesn't currently expose API for that.

CeylonMigrationBot commented 10 years ago

[@gavinking] thanks.

CeylonMigrationBot commented 10 years ago

[@tombentley] Reverted because of #1382. Now scheduled for 1.1

CeylonMigrationBot commented 10 years ago

[@FroMage] Moving to 1.2