Closed gingerwizard closed 8 years ago
I'd be all for it if we could do it without a runtime cost but I'm fairly sure this'd have to have one. I wonder how much of a cost it is though.
Actually now that I think about it it may not have a runtime cost. I expect the JVM already inlines the array's range checking and it's possible it might be able to merge this into that. It is worth checking.
@jdconrad I played with this a bit and I think we can get this without any much runtime cost for folks that don't use it. I wrote a quick and dirty performance test that looks like:
@Repeat(iterations=10)
public void testArrayCompoundInt() {
assertEquals(0, exec("int[] x = new int[5]; x[0] = 1; for (int i = 0; i < 1000000000; i++) {x[0] += x[0]} return x[0]",
emptyMap(), singletonMap("max_loop_counter", "2147483647"), null, true));
}
After the second execution of that test and beyond run in around .6s whether or not I support indexing into the array with a negative lookup. I suspect branch prediction or the trivial conditional needed to support it is being swamped by something. Either way, the cost seems fime. This is the implementation of flipping the index that I made:
/**
* Writes the opcodes to flip a negative array index (meaning slots from the end of the array) into a 0-based one (meaning slots from
* the start of the array).
*/
static void writeIndexFlip(MethodWriter writer) {
Label noFlip = new Label();
// The stack after each instruction: array, index
writer.dup(); // array, index, index
writer.ifZCmp(Opcodes.IFGE, noFlip); // array, index
writer.swap(); // index, array (index is always negative)
writer.dupX1(); // array, index, array
writer.arrayLength(); // array, index, length
writer.visitInsn(Opcodes.IADD); // array, index (index is flipped, so > 0 if the index was less than the size)
writer.mark(noFlip);
}
I believe I could adapt that to lists fairly easily. When you run the test using a negative array index on the rhs (x[0] += x[-5]
) it takes much longer, 2.3s. So we may not want to implement it just because of how much longer it takes. Though I think it is fine because there doesn't seem to be a cost if you don't use it. And we're talking about a cost of 1.7 nanoseconds which per use which isn't terrible though I expect the performance to be an order of magnitude worse if you constantly flip from positive to negative accessors ways that blow branch predication.
It is worth thinking about the cost in the code bloat around the array access. Every array write and store needs an extra 6 instructions. Because the JVM uses instruction counts to determine if it is allowed to inline a method (35 is the default I think) or unroll a loop then the count of the instructions matters here some.
@nik9000 Nice analysis! Thanks again for looking into this. I need some time to process this, but I wonder what the cost of the doing this in user-facing code would be since he/she would need to call array.length anyway. I think that would be the best way to see how much the branching here really costs. I do agree that if it's significantly slower than just having the user write it themselves it would be very trappy.
At worst maybe we could just add something like array.last and ensure support of list.last as special methods/variables if the negative indices end up being too slow.
I do agree that if it's significantly slower than just having the user write it themselves it would be very trappy.
Again my simplistic tests show it isn't any slower. I eyeball a billion x[0] += x[x.length - 5]
also takes about 2.3 seconds on average. I think branch predication is winning the day here.
I'm willing to say "supporting this is as fast as not supporting it" and move on to thinking about the complexity (not so bad in my estimation) and the extra byte codes.
Sorry misread that. That's awesome! Have you thought about how to support the dynamic case as well? Or is that already covered?
You didn't misread anything! I just hadn't run the tests for the explicit case until you asked about it.
I talked to @jasontedor about the cost of adding the extra code and we came to the conclusion that it should be fine. The jvm has a limit of 325 bytes for inlining "hot" methods. Certainly in a search context we hope that painless methods are "hot". But an extra 6 bytes per array read or write are unlikely to be too big a deal. If you have more than a half dozen of array reads and writes you are quite likely blowing the limit already on other stuff.
I haven't looked at the dynamic case yet but I think it'll be the same. I expect PSubListShortcut
to be fairly similar but with a method call instead of the arraylength opcode.
I was thinking of adding a normalizeIndex
method to AStoreable
that writes nothing by default but writes the index flipping opcodes for arrays and lists and calling it in EAssignment
. I'm not super sure that is the right way to go but I think it is where I should start on getting assignment part working.
Yeah, I'm definitely not worried about the extra bytes. If they are doing a huge script in search, it's already problematic for other reasons anyway.
I would need to take a look at the code again (already forgetting so much :( ), but I would hope the PSubBrace can independently handle the negative indices without doing anything special to EAssignment during write since it should know the index expression?
but I would hope the PSubBrace can independently handle the negative indices without doing anything special to EAssignment during write since it should know the index expression?
It could, but it builds the items on the stack in the wrong order. PSubBrace
's assignment gets array, index, value
on the stack and the normalization is much easier to do before value
is put on the stack. I'll poke at it soon and see if I can put together a proposal that isn't horrible.
Ordering is definitely a pain... looking at PSubBrace now, the ordering seems off even for just reads for writeIndexFlip? Since lists and arrays are fundamentally different byte code instructions would it make sense to write the flip code directly in the setup methods of the respective sub nodes? I'm just hesitant to add more complexity/specialization to the EAssignment since it's already pretty scary. I fully admit I might be missing something, though, so if you think this is the best way to do it, I'll roll with it.
I fully admit I might be missing something, though, so if you think this is the best way to do it, I'll roll with it.
I'm the same way. I think it is plenty complex as is but it looks like the right spot. But I could certainly be wrong. I've got a reindex failure I just started investigating that I'll try and play out tomorrow morning and I'll try and roll up my sleeves and see if I can find a sensible way to do this without to much complexity....
Sounds good, thanks @nik9000. I look forward to seeing your solution :) Scratch that last comment, I messed up the ordering in my head. I fully understand what you're saying. Definitely, a difficult problem.
Watcher scripts typically evaluate the last element of an aggregation bucket e.g. for comparing a moving avg to a threshold. Groovy allows you to do this using a -1 array reference e.g.
return ctx.payload.aggregations.date_buckets.buckets[-1].doc_count > (ctx.payload.aggregations.percentiles.values['90.0']+(ctx.payload.aggregations.stats.std_deviation*3));
Currently painless has no shortcut causing the scripts to be alittle longer e.g.
return ctx.payload.aggregations.date_buckets.buckets[ctx.payload.aggregations.date_buckets.buckets.length-1].doc_count > (ctx.payload.aggregations.percentiles.values['90.0']+(ctx.payload.aggregations.stats.std_deviation*3));
Minor. Nice to have.