CeylonMigrationBot commented 11 years ago

[@gavinking] Our for loop is slow:

for (i in min..max) can be optimized to for (int i=min; i<=max; i++) (but watch out for reverse ranges!)
for (i in min:len) can be optimized to for (int i=min; i<min+len; i++)
for (x in arrayOrArraySequence) can be optimized to use a C-style for and direct access to the underlying java array by index

[Migrated from ceylon/ceylon-compiler#882] [Closed at 2014-05-02 10:39:53]

CeylonMigrationBot commented 11 years ago

[@quintesse] I'll suggest something else that will make Gavin want to pluck out my eyes with his teeth, but I think it must be mentioned. One of the reasons we can't make Iterator more efficient is because it has only a single next() method. This means that the detection of the end of the iteration must be handled with a special value like exhausted, which means we are obliged to erase the return value of the method to Object which in turn means we'll always have to do boxing.

I know there is resistance to have Iterator with a hasNext() but for those streams where peeking is not possible it's easy to use read-ahead and remember the element until next() is called, it's hardly rocket science. But it does allow us to generate iterators that a specifically optimized for primitive types.

I think it's worth considering becuase once we reach 1.0 we're stuck with this decision forever.

CeylonMigrationBot commented 11 years ago

[@tombentley]

But it does allow us to generate iterators that a specifically optimized for primitive types.

But presumably next() is declared to return the type parameter Element, thus requiring boxing. Or am I missing something about your proposal.

To explain what I had in mind about the RandomAccessList thing. The compiler could generate an overloaded Element item(long index) method (that is, with a primitive index) when it was compiling something that satisfies RandomAccessList<Element>, which contained the transformed code for the Ceylon item() declaration. The item(Integer) version would box and invoke item(long). With this in place we can transform for (v in randomAccess) using a C-style for loop, avoiding the instantiation of an Iterator and the boxing of all the indices. Obviously in the case of Array and String we still have to box the return values.

This is the kind of specialization based on type arguments which we've generally ruled out so far. However, perhaps there's an argument to be made for this as a special case, precisely because iterating Lists is such a common thing to do. Obviously we could do further specialization to deal with primitive return vaules (with a RandomAccessList::item$character() method, say, or Iterator::next$character()) but that's just further down this slippery slope.

CeylonMigrationBot commented 11 years ago

[@quintesse]

But presumably next() is declared to return the type parameter Element, thus requiring boxing

No, if the iterator is of type Iterator<Integer> for example (not <Integer?>) and we add a next$int() (and next$float() etc) to the Java implementation of Iterator, we could easily generate a call to that method instead of the generic one. This is an optimization we've been discussing on and off on the mailing list and it should be possible to do, but only for types that don't return a union or intersection.

Reading the rest of your message: that's exactly what I propose and it's something we could even do for user types (in a limited way to start with, for example only taking into account types with a single type parameter)

The same can be done btw for parameters, not only return values

CeylonMigrationBot commented 11 years ago

[@quintesse] Imagine something like this:

class Id<T>() {
    T id(T value) { return value; }
}

getting turned into something like:

class Id<T> {
    T id(T value) { return value; }
    long id$int$$int(long value) { return value; }
    double id$float$$float(double value) { return value; }
    .
    . etc
}

CeylonMigrationBot commented 11 years ago

[@tombentley] We'd need RandomAccessList specifically, I think, in order to know whether a Iterator or indexed iteration is best (we don't want to go iterating a linked list with an integer index). But in this case it's also serves as an indication to the compiler to generate type-specialised bytecode, without having to open the can of worms of promising to do such specialisation generally. But I wish there was a way the JIT could figure this stuff out for itself.

CeylonMigrationBot commented 11 years ago

[@quintesse] Thing is the RandomAccessList will only work for for-loops, it doesn't help at all for the other frequent case where we'll just work with chained Iterable. So it might indeed be nice to have, but I don't think we should focus on that specifically, because it will only help in certain cases.

And if like you already pointed out item() itself boxes we are still in the same situation. So instead of making a highly specific improvement only for item() in classes marked with RandomAccessList we could make it generic for all those cases.

Combine it with my other suggestion of changing the design of Iterator and I think we could make the Iterator almost as fast as a for loop (never exactly as fast of course, but at least the difference won't be a nasty surprise).

CeylonMigrationBot commented 11 years ago

[@tombentley] After some very rough trial runs, I can find no substantial difference between an optimized ceylon calculation of pi and the same calculation done in Java on my machine. With an iteration count of 1000000000 for all I got the following timings:

Optimized ceylon: 11.790257755s
Unoptimized ceylon: 11.958480138s
Java: 11.750422546s

(Yes, it would appear the ceylon optimization didn't help all that much). Looking at the Java benchmark in that github repo I note that the iteration counter is an int, whereas in Ceylon it's a long. I wonder if that might have made a big difference on the machine/JVM his benchmarks were run on?

CeylonMigrationBot commented 11 years ago

[@FroMage] That is good to know indeed. Original tweet was there: https://twitter.com/russel_winder/status/269128268320473089 Can you take it up with him?

CeylonMigrationBot commented 11 years ago

[@RossTate] How about having iterator() return a union type? Specifically Iterator<T>|Triple<RandomAccess<T>,Integer,Integer>. RandomAccess<T> then refines this return type to be just Triple<RandomAccess<T>,Integer,Integer>.

I've gotta go, but I'll get to specialization later.

CeylonMigrationBot commented 11 years ago

[@RossTate] For specialization, the problem is that it's really hard/expensive to ensure that every instantiation of a generic class with a primitive type argument has specialized implementations for that primitive. Nonetheless, we may be able to ensure that many such instantiations do. So maybe the best thing to do would be to dynamically check whether an instance we've been given has a specialized implementation, and if so use that one. This may have some slight overhead, and will result in some bytecode duplication, but it's completely transparent so we're free to change this as we see fit in practice.

CeylonMigrationBot commented 11 years ago

[@tombentley] I've now been able to replicate the result of the Ceylon version being almost twice as slow as the Java version. It seems related to Integer being a long, but I'm not yet sure if it's a manifestation of how I'm running the benchmark.

CeylonMigrationBot commented 11 years ago

[@tombentley] Current status:

for (i in min..max): implemented
for (i in min:len):held up by #5060
for (x in arrayOrArraySequence): Not yet started.

CeylonMigrationBot commented 11 years ago

[@FroMage] Moving to M6

CeylonMigrationBot commented 11 years ago

[@gavinking] The big part of this is optimizing Arrays. The general pattern for optimizing that is as follows:

void iterateInts(Iterable<Integer> ints) {
    {
    boolean isArray = ints instanceof Array;
    Iterator<Integer> iter = isArray ? null : ints.iterator();
    Object arr = isArray ? ((Array) ints).toArray() : null;
    int index = 0;
    Object next = null;
    while (isArray ? index++<java.lang.reflect.Array.getLength(arr) : 
                     (next=iter.next()) instanceof Integer) {
        int i = isArray ? java.lang.reflect.Array.getInt(arr,index) : 
                            ((Integer) next).intValue;
        ...
    }
    }
}

(Actually I'm not certain that java.lang.reflect.Array.getInt will actually do unboxing, I need to test that.)

CeylonMigrationBot commented 11 years ago

[@quintesse] Why use the reflection API? Why not just arr.length and arr[index]?

CeylonMigrationBot commented 11 years ago

[@gavinking] If we want to also handle ArraySequence, it gets a little more complex:

void iterateInts(Iterable<Integer> ints) {
    {
    boolean isArray = ints instanceof Array;
    boolean isArraySequence = ints instanceof ArraySequence;
    Iterator<Integer> iter = isArray|isArraySequence ? 
            null : ints.iterator();
    Object arr = null;
    int index = 0;
    int length = 0;
    if (isArray) {
        arr = ((Array) ints).toArray();
        length = java.lang.reflect.Array.getLength(arr);
    }
    if (isArraySequence) {
        arr = ((ArraySequence) ints).array;
        index = ((ArraySequence) ints).first;
        length = ((ArraySequence) ints).length+index;
    }
    Object next = null;
    while (isArray|isArraySequence ? 
            index++<length : 
            (next=iter.next()) instanceof Integer) {
        int int = isArray|isArraySequence ? 
                java.lang.reflect.Array.getInt(arr,index) : 
                ((Integer) next).intValue;
        ...
    }
    }
}

CeylonMigrationBot commented 11 years ago

[@gavinking]

Why use the reflection API? Why not just arr.length and arr[index]?

I'm taking into account the fact that it might be an int[] or an Integer[]. I'm assuming the reflection API is more efficient than using instanceof and then casting, which might not in fact be true.

CeylonMigrationBot commented 11 years ago

[@gavinking] After making this optimization:

5195

Together with the plan to reimplement ArraySequence in terms of Array, this issue becomes even more critical. Currently any access to an array element of type Integer, Float, Boolean, or Character will result in boxing followed by unboxing.

Another possibility might be to handle the get() method specially, generating special-purpose get$int(), get$char(), etc methods that do the unboxing on the callee side.

CeylonMigrationBot commented 10 years ago

[@tombentley] I've been experimenting with this today. I've started off with a simplified optimization as a PoC, where we use a C-style for loop when we know the static type is Array. Results so far:

It turns out that getting the length and item using java.lang.reflect.Array is slower than using our own utility methods which use instanceof and typecasts. At least on my JVM and with my benchmark.
When iterating a Java array (e.g. LongArray javaArray iterated like for (x in javaArray.array)) it's worthwhile avoiding the creation of the IntArray wrapper on which we immediately call toArray() and then throw away.
Using a simple microbenchmark (summing the first 100 longs held in a precomputed long[] 1,000,000 times), on my machine I get:
- A pure java version takes about 28ms
- The unoptimized ceylon version takes about 793ms
- The optimized ceylon version takes aboout 31ms

Obviously this is fairly encouraging, so I plan to start working on the dynamic version next. There are a few extra things worth checking for that one:

Does it slow things down for the non-Array iteration case? By how much?
How much does is increase the size of the compiled code?

I suspect the answer to both of those will be "negligable", but it's worth checking.

CeylonMigrationBot commented 10 years ago

[@tombentley] I've now implemented the same C-style for loop when iterating (static) ArraySequences. The equivalent benchmark (summing the first 100 ceylon.language.Integers held in a precomputed ArraySequence<Integer> 1,000,000 times) results are:

Unoptimized ceylon takes about 318ms
Optimized ceylon takes about 70ms

(It's not what this issue is about but just FTR: the benchmark doesn't see the GC effects of having a Java array of wrapped longs, so we can't really conclude anything about the relative perf of Array vs ArraySequence).

The main issue with what I've implemented so far is the loss of encapsulation of the fields of ArraySequence, as alluded to in #5190. For the moment I'm just using @Ignored getters on ArraySequence.

CeylonMigrationBot commented 10 years ago

[@gavinking] @tombentley excellent!

CeylonMigrationBot commented 10 years ago

[@tombentley] I've now got the dynamic version of this optimization working (i.e. where all we know at compile time is that we have an Iterable and so we translate to a loop like @gavinking proposed above so that we use indexed access on values whose runtime type is Arrays or ArraySequences).

Unsurprisingly the results (with a single microbenchmark on my machine with my JVM) are mixed.

JVM can optimize away the type checks

The "best case" scenario is where the JVM is sufficiently warmed up that it's had time to profile and optimize the loop, and the loop is only executes with a single type at runtime. This means that the JVM probably noticed that the loop only executed with the given runtime type and optimized for that case (i.e. all those conditionals got optimized out). Each run was performed in a separate JVM.

If neither optimization is enabled (i.e. we just generate a vanilla loop like we would for 1.0):
- With an Array at runtime: 657ms
- With an ArraySequence at runtime: 222ms
- With a Range at runtime: 1254ms
If just the Array optimization is enabled (i.e. we don't generate code to specialized the loop for ArraySequence):
- With an Array at runtime: 28ms (wow! I think the JVM realised on this run we it was only being called with an array argument and the loop)
- With an ArraySequence at runtime: 221ms (not sure why this is as fast as the no-optimizations + ArraySequence case)
- With a Range at runtime: 1323ms
If just the ArraySequence optimization is enabled (i.e. we don't generate code to specialized the loop for Array):
- With an Array at runtime: 631ms
- With an ArraySequence at runtime: 80ms
- With a Range at runtime: 1332ms
If both optimizations are enabled:
- With an Array at runtime: 29ms
- With an ArraySequence at runtime: 71ms
- With a Range at runtime: 1333ms

JVM cannot optimize away the type checks

The other common case is where a loop gets optimized but executes with various types of Iterable, so cannot be optimized as effectively (or has to be deoptimized...). I ran a modified microbenchmark where during warmup the loop was executed on four different types of Iterable: Array, ArraySequence, Range and a comprehension.

If neither optimization is enabled (i.e. we just generate a vanilla loop like we would for 1.0):
- With an Array at runtime: 993ms
- With an ArraySequence at runtime: 678ms
- With a Range at runtime: 1557ms
If both optimizations are enabled then the worst results I saw were
- With an Array at runtime: 760ms
- With an ArraySequence at runtime: 367ms
- With a Range at runtime: 1399ms

There's also the third case: When the JVM doesn't compile the loop at all, but interprets it. I've not attempted to benchmark that. Nor have I looked at the effect of code size.

Obviously these results are just for some microbenchmarks which were just just on my machine with 1 JVM. It would be nice if we could have a little more information about how broadly applicable these numbers are.

CeylonMigrationBot commented 10 years ago

[@gavinking]

Unsurprisingly the results (with a single microbenchmark on my machine with my JVM) are mixed.

When I read that, I thought you were going to tell me that in some cases it got slower. But in fact the optimized code was in general much faster! That is to say, your oprimizations work.

If both optimizations are enabled then the worst results I saw were

With an Array at runtime: 760ms

With an ArraySequence at runtime: 367ms

With a Range at runtime: 1399ms

Did your new optimizations break the optimization we already used to do for for (i in 0..n)? Or is that still there? Then what kind of Range was this one being tested?

CeylonMigrationBot commented 10 years ago

[@gavinking] I wonder if a 3rd optimization would make sense, especially for Ranges, to use the successor operation instead of created in Iterator?

CeylonMigrationBot commented 10 years ago

[@tombentley]

Did your new optimizations break the optimization we already used to do for for (i in 0..n)?

No

Then what kind of Range was this one being tested?

one we don't currently optimize

I wonder if a 3rd optimization would make sense, especially for Ranges

I wondered about that too. Let's try it!

CeylonMigrationBot commented 10 years ago

[@tombentley] A static type specialization for Range seems to be worthwhile:

for (item in range) runs in about half the time using the optimization than without it.
in theory we could also support for (item in range.by(step)), but that one's a bit complicated:
- Range.by() has its own optimization for the case of a Range<Integer> which seems to run faster than the optimized loop that would be generated by the compiler.
- For a non-Integer type argument the compiler optimization works out faster.

CeylonMigrationBot commented 10 years ago

[@tombentley] I'm pausing work on this for the time being to work on another issue. Still to do on this issue:

benchmark to find out whether we really need to have an optimized transformation for for (i in min:len) -- it could be that that RangeIterationStatic optimization is sufficient. If that is the case we may be able to remove the RangeOpIteration optimization.
the ArraySequenceIterationStatic optimization should be applicable to Tuples as well.
Tidy up the selection of loop optimizations
Possibly: Add a command line option to control which optimizations are used (there's already one which is used for testing purposes, it's whether we really need to expose it at the tool level).

CeylonMigrationBot commented 10 years ago

[@FroMage] @tombentley is this done or what?

CeylonMigrationBot commented 10 years ago

[@tombentley] Or what.

There's still work we should try to do for 1.1 here. Optimized Tuple iteration is the main one we should try to get done, plus #1467. I'll look at it next.

CeylonMigrationBot commented 10 years ago

[@tombentley] @gavinking's rework of arrays means that the array attribute of an IntArray is now Array<java.lang.Integer>. Before it was a Array<ceylon.language.Integer> and I had code like:

void arrayOfInts(IntArray array) {
    for (element in array.array) {
        print(element+5);
    }
}

An optimization understood that iterating like this could use a C-style for loop over the underlying int[] and avoid all boxing. I can't do that now because the compiler doesn't erase a j.l.Integer to an int, so it has to be boxed (and it has to be boxed as the completely useless j.l.Integer, the loop will most likely then need to unbox it or rebox it).

Unless we add another attribute on IntArray for getting an Array<ceylon.language.Integer> I can't see how to keep this optimization.

CeylonMigrationBot commented 10 years ago

[@quintesse] But wasn't the idea that Array<java.lang.Integer> would get erased to int[] and Array<java.lang.Integer?> to java.lang.Integer[]? Edit: sorry, not erased of course, only IntArray gets erased. So an extra attribute seems necessary, right?

CeylonMigrationBot commented 10 years ago

[@gavinking] But an IntArray already is an int[]. You don't need any special extra attribute!

Sent from my iPhone

On 8 Apr 2014, at 5:00 am, Tom Bentley notifications@github.com wrote:

@gavinking's rework of arrays means that the array attribute of an IntArray is now Array. Before it was a Array and I had code like:

void arrayOfInts(IntArray array) { for (element in array.array) { print(element+5); } } An optimization understood that iterating like this could use a C-style for loop over the underlying int[] and avoid all boxing. I can't do that now because the compiler doesn't erase a j.l.Integer to an int, so it has to be boxed (and it has to be boxed as the completely useless j.l.Integer, the loop will most likely then need to unbox it or rebox it).

Unless we add another attribute on IntArrayfor getting anArray` I can't see how to keep this optimization.

— Reply to this email directly or view it on GitHub.

CeylonMigrationBot commented 10 years ago

[@tombentley] @gavinking I don't know if that's directed at me or @quintesse.

The use case I'm thinking about is where you have an IntArray and you want to iterate it efficiently. People will expect to use for to do that, but IntArray is not Iterable, so they either have to give up using for and use while (which will be efficient, but unnatural). They're quite likely to try to get some Iterable thing from the IntArray so they can use for (which unless we optimize it, will be natural, but inefficient).

CeylonMigrationBot commented 10 years ago

[@gavinking] But it's an int[] so you can just use a C-style for on it directly!

Sent from my iPhone

On 8 Apr 2014, at 9:31 am, Tom Bentley notifications@github.com wrote:

@gavinking I don't know if that's directed at me or @quintesse.

The use case I'm thinking about is where you have an IntArray and you want to iterate it efficiently. People will expect to use for to do that, but IntArray is not Iterable, so they either have to give up using for and use while (which will be efficient, but unnatural). They're quite likely to try to get some Iterable thing from the IntArray so they can use for (which unless we optimize it, will be natural, but inefficient).

— Reply to this email directly or view it on GitHub.

CeylonMigrationBot commented 10 years ago

[@tombentley] Yes, I know that!! But you can't write for (i in intArray) {} in Ceylon because it's not Iterable, which is what people would expect. You'd have to write some nasty while loop. If that's what you expect people to do, that's fine, but please say so!

We badly need to document this array stuff.

CeylonMigrationBot commented 10 years ago

[@gavinking] Oh OK, now I understand. Then Takos comment is correct. An Array now always wraps a int[].

Sent from my iPhone

On 8 Apr 2014, at 9:40 am, Tom Bentley notifications@github.com wrote:

Yes, I know that!! But you can't write for (i in intArray) {} in Ceylon because it's not Iterable, which is what people would expect. You'd have to write some nasty while loop. If that's what you expect people to do, that's fine, but please say so!

We badly need to document this array stuff.

— Reply to this email directly or view it on GitHub.

CeylonMigrationBot commented 10 years ago

[@tombentley] But how to I get a Array<ceylon.language::Integer> from an IntArray, because the obvious place to look (the array attribute) gives me an Array<java.lang::Integer>, which is in not the same thing!

CeylonMigrationBot commented 10 years ago

[@gavinking] That's the thing you need, I think.

Sent from my iPhone

On 8 Apr 2014, at 9:54 am, Tom Bentley notifications@github.com wrote:

But how to I get a Arrayceylon.language::Integer from an IntArray, because the obvious place to look (the array attribute) gives me an Arrayjava.lang::Integer, which is in not the same thing!

— Reply to this email directly or view it on GitHub.

CeylonMigrationBot commented 10 years ago

[@tombentley] What do we call this Array<ceylon.language::Integer> attribute? ceylonArray sounds really lame. To me the one called array should be Array<ceylon.language::Integer> and the less useful Array<java.lang::Integer> should be boxedArray or javaBoxedArray or something like that.

CeylonMigrationBot commented 10 years ago

[@quintesse] If I remember things right, one of the reasons @gavinking made these changes was because often we were "lying" about the underlying array, pretending that it was of c.l.Integer when in fact the type was smaller (int, short or byte) which then gave those weird problems when someone puts a larger number in it but gets a smaller out of it.

So can't we just say that the code that does the optimization checks for Array<j.l.Xxxx> (where Xxxx is one of the wrapper classes for basic types) and then just uses the toArray() method and casts the result to the corresponding native array type?

Edit: The idea being that an array like that can never be backed by anything else than a native array of primitive types. This way we don't have to introduce a new attribute.

CeylonMigrationBot commented 10 years ago

[@gavinking] @tombentley I don't think you need a new attribute. @quintesse has it right.

CeylonMigrationBot commented 10 years ago

[@tombentley] I think you're saying we only need to worry about optimizing this case:

Array<ceylon.language.Integer> array = ...
for (ceylon.language.Integer i in array) {
    ...
}

And if we have a plain IntArray then:

import ceylon.interop.java{toIntegerArray}

IntArray ints = ...
for (ceylon.language.Integer i in toIntegerArray(ints)) {
    ...
}

Is that right?

CeylonMigrationBot commented 10 years ago

[@quintesse] No, the idea was that somehow we could do:

IntArray ints = ...
for (c.l.Integer i in int.array) {
    ...
}

And then you could just detect that we're dealing with an Array of a "primitive" type, call toArray() on it, cast it to it's Java primitive type array and loop over that.

Thing is we can't do c.l.Integer i in int.array.

That would leave c.l..Integer i in toIntegerArray(ints) as you say @tombentley , but that makes a copy so that sucks.

And this is because Array can be changed.

What we need is a way to treat an IntArray, ShortArray or ByteArray as if it's a Array<c.l.Integer> without having to worry that somebody might go and make changes using that latter representation. We could change Array to either always throw an exception if somebody tries to do a set(c.l.Integer) when the underlying type is smaller, or only when the value is too large.

@gavinking , in the end I think Tom is right and we can't get around the fact that we would need an extra attribute. Because in reality arrays using java.lang types are pretty useless to us.

CeylonMigrationBot commented 10 years ago

[@gavinking] Ah, alright, then the only way to do this would be to make IntArray an Iterable<ceylon.language::Integer>. Is there a good reason it is not one already?

CeylonMigrationBot commented 10 years ago

[@tombentley] Isn't the fact that it's erased to int[] enough?

CeylonMigrationBot commented 10 years ago

[@gavinking]

Ah, alright, then the only way to do this would be to make IntArray an Iterable<ceylon.language::Integer>.

Or even just add a function with this signature to ceylon.interop.java:

{Integer*} integerElements(IntegerArrayLike array)

I'm not sure which is better.

CeylonMigrationBot commented 10 years ago

[@gavinking]

Isn't the fact that it's erased to int[] enough?

No, because as you just pointed out, I have no way to write a for loop for it in Ceylon.

CeylonMigrationBot commented 10 years ago

[@tombentley]

No, because as you just pointed out, I have no way to write a for loop for it in Ceylon.

I'm saying the fact that it's erased to int[] is the reason IntArray isn't Iterable.

CeylonMigrationBot commented 10 years ago

[@gavinking]

I'm saying the fact that it's erased to int[] is the reason IntArray isn't Iterable.

I would have thought that the autoboxing infrastructure is smart enough to box the int[] where necessary, but perhaps not?

CeylonMigrationBot commented 10 years ago

[@tombentley] Oh, I see what you mean: We box it whenever we assign an IntArray to an {Integer*}. Well, maybe we could do that. But it's totally out side the scope of this issue.

eclipse-archived / ceylon

Optimize for loop for Range, ArraySequence, and Array #882

5195

JVM can optimize away the type checks

JVM cannot optimize away the type checks