Closed CeylonMigrationBot closed 8 years ago
[@quintesse] I'll suggest something else that will make Gavin want to pluck out my eyes with his teeth, but I think it must be mentioned. One of the reasons we can't make Iterator
more efficient is because it has only a single next()
method. This means that the detection of the end of the iteration must be handled with a special value like exhausted
, which means we are obliged to erase the return value of the method to Object
which in turn means we'll always have to do boxing.
I know there is resistance to have Iterator
with a hasNext()
but for those streams where peeking is not possible it's easy to use read-ahead and remember the element until next()
is called, it's hardly rocket science.
But it does allow us to generate iterators that a specifically optimized for primitive types.
I think it's worth considering becuase once we reach 1.0 we're stuck with this decision forever.
[@tombentley]
But it does allow us to generate iterators that a specifically optimized for primitive types.
But presumably next()
is declared to return the type parameter Element
, thus requiring boxing. Or am I missing something about your proposal.
To explain what I had in mind about the RandomAccessList
thing. The compiler could generate an overloaded Element item(long index)
method (that is, with a primitive index
) when it was compiling something that satisfies RandomAccessList<Element>
, which contained the transformed code for the Ceylon item()
declaration. The item(Integer)
version would box and invoke item(long)
. With this in place we can transform for (v in randomAccess)
using a C-style for
loop, avoiding the instantiation of an Iterator
and the boxing of all the indices. Obviously in the case of Array
and String
we still have to box the return values.
This is the kind of specialization based on type arguments which we've generally ruled out so far. However, perhaps there's an argument to be made for this as a special case, precisely because iterating List
s is such a common thing to do. Obviously we could do further specialization to deal with primitive return vaules (with a RandomAccessList::item$character()
method, say, or Iterator::next$character()
) but that's just further down this slippery slope.
[@quintesse]
But presumably next() is declared to return the type parameter Element, thus requiring boxing
No, if the iterator is of type Iterator<Integer>
for example (not <Integer?>
) and we add a next$int()
(and next$float()
etc) to the Java implementation of Iterator
, we could easily generate a call to that method instead of the generic one. This is an optimization we've been discussing on and off on the mailing list and it should be possible to do, but only for types that don't return a union or intersection.
Reading the rest of your message: that's exactly what I propose and it's something we could even do for user types (in a limited way to start with, for example only taking into account types with a single type parameter)
The same can be done btw for parameters, not only return values
[@quintesse] Imagine something like this:
class Id<T>() {
T id(T value) { return value; }
}
getting turned into something like:
class Id<T> {
T id(T value) { return value; }
long id$int$$int(long value) { return value; }
double id$float$$float(double value) { return value; }
.
. etc
}
[@tombentley] We'd need RandomAccessList
specifically, I think, in order to know whether a Iterator
or indexed iteration is best (we don't want to go iterating a linked list with an integer index). But in this case it's also serves as an indication to the compiler to generate type-specialised bytecode, without having to open the can of worms of promising to do such specialisation generally. But I wish there was a way the JIT could figure this stuff out for itself.
[@quintesse] Thing is the RandomAccessList
will only work for for-loops, it doesn't help at all for the other frequent case where we'll just work with chained Iterable
. So it might indeed be nice to have, but I don't think we should focus on that specifically, because it will only help in certain cases.
And if like you already pointed out item()
itself boxes we are still in the same situation. So instead of making a highly specific improvement only for item()
in classes marked with RandomAccessList
we could make it generic for all those cases.
Combine it with my other suggestion of changing the design of Iterator
and I think we could make the Iterator
almost as fast as a for loop (never exactly as fast of course, but at least the difference won't be a nasty surprise).
[@tombentley] After some very rough trial runs, I can find no substantial difference between an optimized ceylon calculation of pi and the same calculation done in Java on my machine. With an iteration count of 1000000000 for all I got the following timings:
(Yes, it would appear the ceylon optimization didn't help all that much). Looking at the Java benchmark in that github repo I note that the iteration counter is an int
, whereas in Ceylon it's a long
. I wonder if that might have made a big difference on the machine/JVM his benchmarks were run on?
[@FroMage] That is good to know indeed. Original tweet was there: https://twitter.com/russel_winder/status/269128268320473089 Can you take it up with him?
[@RossTate] How about having iterator()
return a union type? Specifically Iterator<T>|Triple<RandomAccess<T>,Integer,Integer>
. RandomAccess<T>
then refines this return type to be just Triple<RandomAccess<T>,Integer,Integer>
.
I've gotta go, but I'll get to specialization later.
[@RossTate] For specialization, the problem is that it's really hard/expensive to ensure that every instantiation of a generic class with a primitive type argument has specialized implementations for that primitive. Nonetheless, we may be able to ensure that many such instantiations do. So maybe the best thing to do would be to dynamically check whether an instance we've been given has a specialized implementation, and if so use that one. This may have some slight overhead, and will result in some bytecode duplication, but it's completely transparent so we're free to change this as we see fit in practice.
[@tombentley] I've now been able to replicate the result of the Ceylon version being almost twice as slow as the Java version. It seems related to Integer
being a long
, but I'm not yet sure if it's a manifestation of how I'm running the benchmark.
[@tombentley] Current status:
for (i in min..max)
: implementedfor (i in min:len)
:held up by #5060for (x in arrayOrArraySequence)
: Not yet started.[@FroMage] Moving to M6
[@gavinking] The big part of this is optimizing Array
s. The general pattern for optimizing that is as follows:
void iterateInts(Iterable<Integer> ints) {
{
boolean isArray = ints instanceof Array;
Iterator<Integer> iter = isArray ? null : ints.iterator();
Object arr = isArray ? ((Array) ints).toArray() : null;
int index = 0;
Object next = null;
while (isArray ? index++<java.lang.reflect.Array.getLength(arr) :
(next=iter.next()) instanceof Integer) {
int i = isArray ? java.lang.reflect.Array.getInt(arr,index) :
((Integer) next).intValue;
...
}
}
}
(Actually I'm not certain that java.lang.reflect.Array.getInt
will actually do unboxing, I need to test that.)
[@quintesse] Why use the reflection API? Why not just arr.length
and arr[index]
?
[@gavinking] If we want to also handle ArraySequence
, it gets a little more complex:
void iterateInts(Iterable<Integer> ints) {
{
boolean isArray = ints instanceof Array;
boolean isArraySequence = ints instanceof ArraySequence;
Iterator<Integer> iter = isArray|isArraySequence ?
null : ints.iterator();
Object arr = null;
int index = 0;
int length = 0;
if (isArray) {
arr = ((Array) ints).toArray();
length = java.lang.reflect.Array.getLength(arr);
}
if (isArraySequence) {
arr = ((ArraySequence) ints).array;
index = ((ArraySequence) ints).first;
length = ((ArraySequence) ints).length+index;
}
Object next = null;
while (isArray|isArraySequence ?
index++<length :
(next=iter.next()) instanceof Integer) {
int int = isArray|isArraySequence ?
java.lang.reflect.Array.getInt(arr,index) :
((Integer) next).intValue;
...
}
}
}
[@gavinking]
Why use the reflection API? Why not just
arr.length
andarr[index]
?
I'm taking into account the fact that it might be an int[]
or an Integer[]
. I'm assuming the reflection API is more efficient than using instanceof
and then casting, which might not in fact be true.
[@gavinking] After making this optimization:
Together with the plan to reimplement ArraySequence
in terms of Array
, this issue becomes even more critical. Currently any access to an array element of type Integer
, Float
, Boolean
, or Character
will result in boxing followed by unboxing.
Another possibility might be to handle the get()
method specially, generating special-purpose get$int()
, get$char()
, etc methods that do the unboxing on the callee side.
[@tombentley] I've been experimenting with this today. I've started off with a simplified optimization as a PoC, where we use a C-style for
loop when we know the static type is Array
. Results so far:
java.lang.reflect.Array
is slower than using our own utility methods which use instanceof
and typecasts. At least on my JVM and with my benchmark.LongArray javaArray
iterated like for (x in javaArray.array)
) it's worthwhile avoiding the creation of the IntArray
wrapper on which we immediately call toArray()
and then throw away.long[]
1,000,000 times), on my machine I get:
Obviously this is fairly encouraging, so I plan to start working on the dynamic version next. There are a few extra things worth checking for that one:
Array
iteration case? By how much?I suspect the answer to both of those will be "negligable", but it's worth checking.
[@tombentley] I've now implemented the same C-style for
loop when iterating (static) ArraySequence
s. The equivalent benchmark (summing the first 100 ceylon.language.Integer
s held in a precomputed ArraySequence<Integer>
1,000,000 times) results are:
(It's not what this issue is about but just FTR: the benchmark doesn't see the GC effects of having a Java array of wrapped longs, so we can't really conclude anything about the relative perf of Array
vs ArraySequence
).
The main issue with what I've implemented so far is the loss of encapsulation of the fields of ArraySequence
, as alluded to in #5190. For the moment I'm just using @Ignore
d getters on ArraySequence
.
[@gavinking] @tombentley excellent!
[@tombentley] I've now got the dynamic version of this optimization working (i.e. where all we know at compile time is that we have an Iterable
and so we translate to a loop like @gavinking proposed above so that we use indexed access on values whose runtime type is Array
s or ArraySequence
s).
Unsurprisingly the results (with a single microbenchmark on my machine with my JVM) are mixed.
The "best case" scenario is where the JVM is sufficiently warmed up that it's had time to profile and optimize the loop, and the loop is only executes with a single type at runtime. This means that the JVM probably noticed that the loop only executed with the given runtime type and optimized for that case (i.e. all those conditionals got optimized out). Each run was performed in a separate JVM.
Array
at runtime: 657msArraySequence
at runtime: 222msRange
at runtime: 1254msArray
optimization is enabled (i.e. we don't generate code to specialized the loop for ArraySequence
):
Array
at runtime: 28ms (wow! I think the JVM realised on this run we it was only being called with an array argument and the loop)ArraySequence
at runtime: 221ms (not sure why this is as fast as the no-optimizations + ArraySequence case)Range
at runtime: 1323msArraySequence
optimization is enabled (i.e. we don't generate code to specialized the loop for Array
):
Array
at runtime: 631msArraySequence
at runtime: 80msRange
at runtime: 1332msArray
at runtime: 29msArraySequence
at runtime: 71msRange
at runtime: 1333msThe other common case is where a loop gets optimized but executes with various types of Iterable
, so cannot be optimized as effectively (or has to be deoptimized...). I ran a modified microbenchmark where during warmup the loop was executed on four different types of Iterable
: Array
, ArraySequence
, Range
and a comprehension.
Array
at runtime: 993msArraySequence
at runtime: 678msRange
at runtime: 1557msArray
at runtime: 760msArraySequence
at runtime: 367msRange
at runtime: 1399msThere's also the third case: When the JVM doesn't compile the loop at all, but interprets it. I've not attempted to benchmark that. Nor have I looked at the effect of code size.
Obviously these results are just for some microbenchmarks which were just just on my machine with 1 JVM. It would be nice if we could have a little more information about how broadly applicable these numbers are.
[@gavinking]
Unsurprisingly the results (with a single microbenchmark on my machine with my JVM) are mixed.
When I read that, I thought you were going to tell me that in some cases it got slower. But in fact the optimized code was in general much faster! That is to say, your oprimizations work.
- If both optimizations are enabled then the worst results I saw were
- With an
Array
at runtime: 760ms- With an
ArraySequence
at runtime: 367ms- With a
Range
at runtime: 1399ms
Did your new optimizations break the optimization we already used to do for for (i in 0..n)
? Or is that still there? Then what kind of Range
was this one being tested?
[@gavinking] I wonder if a 3rd optimization would make sense, especially for Range
s, to use the successor
operation instead of created in Iterator
?
[@tombentley]
Did your new optimizations break the optimization we already used to do for for (i in 0..n)?
No
Then what kind of Range was this one being tested?
one we don't currently optimize
I wonder if a 3rd optimization would make sense, especially for
Range
s
I wondered about that too. Let's try it!
[@tombentley] A static type specialization for Range
seems to be worthwhile:
for (item in range)
runs in about half the time using the optimization than without it. for (item in range.by(step))
, but that one's a bit complicated:
Range.by()
has its own optimization for the case of a Range<Integer>
which seems to run faster than the optimized loop that would be generated by the compiler.Integer
type argument the compiler optimization works out faster.[@tombentley] I'm pausing work on this for the time being to work on another issue. Still to do on this issue:
for (i in min:len)
-- it could be that that RangeIterationStatic
optimization is sufficient. If that is the case we may be able to remove the RangeOpIteration
optimization.ArraySequenceIterationStatic
optimization should be applicable to Tuple
s as well.[@FroMage] @tombentley is this done or what?
[@tombentley] Or what.
There's still work we should try to do for 1.1 here. Optimized Tuple
iteration is the main one we should try to get done, plus #1467. I'll look at it next.
[@tombentley]
@gavinking's rework of arrays means that the array
attribute of an IntArray
is now Array<java.lang.Integer>
. Before it was a Array<ceylon.language.Integer>
and I had code like:
void arrayOfInts(IntArray array) {
for (element in array.array) {
print(element+5);
}
}
An optimization understood that iterating like this could use a C-style for
loop over the underlying int[] and avoid all boxing. I can't do that now because the compiler doesn't erase a j.l.Integer to an int
, so it has to be boxed (and it has to be boxed as the completely useless j.l.Integer, the loop will most likely then need to unbox it or rebox it).
Unless we add another attribute on IntArray
for getting an Array<ceylon.language.Integer>
I can't see how to keep this optimization.
[@quintesse] But wasn't the idea that Array<java.lang.Integer>
would get erased to int[]
and Array<java.lang.Integer?>
to java.lang.Integer[]
?
Edit: sorry, not erased of course, only IntArray
gets erased.
So an extra attribute seems necessary, right?
[@gavinking] But an IntArray already is an int[]. You don't need any special extra attribute!
Sent from my iPhone
On 8 Apr 2014, at 5:00 am, Tom Bentley notifications@github.com wrote:
@gavinking's rework of arrays means that the array attribute of an IntArray is now Array
. Before it was a Array and I had code like: void arrayOfInts(IntArray array) { for (element in array.array) { print(element+5); } } An optimization understood that iterating like this could use a C-style for loop over the underlying int[] and avoid all boxing. I can't do that now because the compiler doesn't erase a j.l.Integer to an int, so it has to be boxed (and it has to be boxed as the completely useless j.l.Integer, the loop will most likely then need to unbox it or rebox it).
Unless we add another attribute on IntArrayfor getting anArray` I can't see how to keep this optimization.
— Reply to this email directly or view it on GitHub.
[@tombentley] @gavinking I don't know if that's directed at me or @quintesse.
The use case I'm thinking about is where you have an IntArray
and you want to iterate it efficiently. People will expect to use for
to do that, but IntArray
is not Iterable
, so they either have to give up using for
and use while
(which will be efficient, but unnatural). They're quite likely to try to get some Iterable
thing from the IntArray
so they can use for
(which unless we optimize it, will be natural, but inefficient).
[@gavinking] But it's an int[] so you can just use a C-style for on it directly!
Sent from my iPhone
On 8 Apr 2014, at 9:31 am, Tom Bentley notifications@github.com wrote:
@gavinking I don't know if that's directed at me or @quintesse.
The use case I'm thinking about is where you have an IntArray and you want to iterate it efficiently. People will expect to use for to do that, but IntArray is not Iterable, so they either have to give up using for and use while (which will be efficient, but unnatural). They're quite likely to try to get some Iterable thing from the IntArray so they can use for (which unless we optimize it, will be natural, but inefficient).
— Reply to this email directly or view it on GitHub.
[@tombentley] Yes, I know that!! But you can't write for (i in intArray) {}
in Ceylon because it's not Iterable
, which is what people would expect. You'd have to write some nasty while
loop. If that's what you expect people to do, that's fine, but please say so!
We badly need to document this array stuff.
[@gavinking] Oh OK, now I understand. Then Takos comment is correct. An Array
Sent from my iPhone
On 8 Apr 2014, at 9:40 am, Tom Bentley notifications@github.com wrote:
Yes, I know that!! But you can't write for (i in intArray) {} in Ceylon because it's not Iterable, which is what people would expect. You'd have to write some nasty while loop. If that's what you expect people to do, that's fine, but please say so!
We badly need to document this array stuff.
— Reply to this email directly or view it on GitHub.
[@tombentley] But how to I get a Array<ceylon.language::Integer>
from an IntArray
, because the obvious place to look (the array
attribute) gives me an Array<java.lang::Integer>
, which is in not the same thing!
[@gavinking] That's the thing you need, I think.
Sent from my iPhone
On 8 Apr 2014, at 9:54 am, Tom Bentley notifications@github.com wrote:
But how to I get a Arrayceylon.language::Integer from an IntArray, because the obvious place to look (the array attribute) gives me an Arrayjava.lang::Integer, which is in not the same thing!
— Reply to this email directly or view it on GitHub.
[@tombentley] What do we call this Array<ceylon.language::Integer>
attribute? ceylonArray
sounds really lame. To me the one called array
should be Array<ceylon.language::Integer>
and the less useful Array<java.lang::Integer>
should be boxedArray
or javaBoxedArray
or something like that.
[@quintesse] If I remember things right, one of the reasons @gavinking made these changes was because often we were "lying" about the underlying array, pretending that it was of c.l.Integer
when in fact the type was smaller (int
, short
or byte
) which then gave those weird problems when someone puts a larger number in it but gets a smaller out of it.
So can't we just say that the code that does the optimization checks for Array<j.l.Xxxx>
(where Xxxx is one of the wrapper classes for basic types) and then just uses the toArray()
method and casts the result to the corresponding native array type?
Edit: The idea being that an array like that can never be backed by anything else than a native array of primitive types. This way we don't have to introduce a new attribute.
[@gavinking] @tombentley I don't think you need a new attribute. @quintesse has it right.
[@tombentley] I think you're saying we only need to worry about optimizing this case:
Array<ceylon.language.Integer> array = ...
for (ceylon.language.Integer i in array) {
...
}
And if we have a plain IntArray
then:
import ceylon.interop.java{toIntegerArray}
IntArray ints = ...
for (ceylon.language.Integer i in toIntegerArray(ints)) {
...
}
Is that right?
[@quintesse] No, the idea was that somehow we could do:
IntArray ints = ...
for (c.l.Integer i in int.array) {
...
}
And then you could just detect that we're dealing with an Array
of a "primitive" type, call toArray()
on it, cast it to it's Java primitive type array and loop over that.
Thing is we can't do c.l.Integer i in int.array
.
That would leave c.l..Integer i in toIntegerArray(ints)
as you say @tombentley , but that makes a copy so that sucks.
And this is because Array
can be changed.
What we need is a way to treat an IntArray
, ShortArray
or ByteArray
as if it's a Array<c.l.Integer>
without having to worry that somebody might go and make changes using that latter representation. We could change Array
to either always throw an exception if somebody tries to do a set(c.l.Integer)
when the underlying type is smaller, or only when the value is too large.
@gavinking , in the end I think Tom is right and we can't get around the fact that we would need an extra attribute. Because in reality arrays using java.lang types are pretty useless to us.
[@gavinking] Ah, alright, then the only way to do this would be to make IntArray
an Iterable<ceylon.language::Integer>
. Is there a good reason it is not one already?
[@tombentley] Isn't the fact that it's erased to int[]
enough?
[@gavinking]
Ah, alright, then the only way to do this would be to make
IntArray
anIterable<ceylon.language::Integer>
.
Or even just add a function with this signature to ceylon.interop.java
:
{Integer*} integerElements(IntegerArrayLike array)
I'm not sure which is better.
[@gavinking]
Isn't the fact that it's erased to
int[]
enough?
No, because as you just pointed out, I have no way to write a for
loop for it in Ceylon.
[@tombentley]
No, because as you just pointed out, I have no way to write a for loop for it in Ceylon.
I'm saying the fact that it's erased to int[]
is the reason IntArray
isn't Iterable
.
[@gavinking]
I'm saying the fact that it's erased to
int[]
is the reasonIntArray
isn'tIterable
.
I would have thought that the autoboxing infrastructure is smart enough to box the int[]
where necessary, but perhaps not?
[@tombentley] Oh, I see what you mean: We box it whenever we assign an IntArray
to an {Integer*}
. Well, maybe we could do that. But it's totally out side the scope of this issue.
[@gavinking] Our
for
loop is slow:for (i in min..max)
can be optimized tofor (int i=min; i<=max; i++)
(but watch out for reverse ranges!)for (i in min:len)
can be optimized tofor (int i=min; i<min+len; i++)
for (x in arrayOrArraySequence)
can be optimized to use a C-stylefor
and direct access to the underlying java array by index[Migrated from ceylon/ceylon-compiler#882] [Closed at 2014-05-02 10:39:53]