eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
934 stars 392 forks source link

x86-64: Implement an enhancement for byte array and char array System.arraycopy #7332

Closed a7ehuo closed 1 month ago

a7ehuo commented 1 month ago

Implement an enhancement for byte array and char array System.arraycopy on x86-64.

Also add JIT options that disable array copy enhancement:

a7ehuo commented 1 month ago

@0xdaryl May I ask you to review this change? Thank you!

@vijaysun-omr @hzongaro fyi

a7ehuo commented 1 month ago

x86-64 macOS failure is a known issue #7181

The following tests FAILED:
      2 - TestJitBuilderAPIGenerator (Failed)
0xdaryl commented 1 month ago

Can you attach a JIT log for a method that shows the code generated for these functions? Just a tracecg of a single method should be fine. Thanks.

a7ehuo commented 1 month ago

Can you attach a JIT log for a method that shows the code generated for these functions?

Search arrayCopy8BitPrimitiveEnhancementImplRoot8 for byte array or arrayCopy16BitPrimitiveEnhancementImplRoot16 for char array to jump to the code gen section on arraycopy

jitlog.arrayCopy8BitPrimitiveEnhancementImplRoot8.txt

jitlog.arrayCopy16BitPrimitiveEnhancementImplRoot16.txt

0xdaryl commented 1 month ago

Generally, I think the instruction sequences are reasonable. Please consider Brad’s suggestions where appropriate for using wider instructions on hardware that supports it.

My main feedback concerns the naming of these optimizations. Calling them “enhancements”, while technically correct, is too generic for someone who wasn’t involved in its development up front. There aren’t even any code comments to describe what the enhancements are, leaving others to read and understand the code to learn about them. I suggest incorporating something like “inlineSmallArraycopiesWithoutSTOS” into the naming would make it better. For example, “inlineSmall16BitPrimitiveArraycopiesWithoutSTOS”

Longer term, I’m not convinced that inlining this much code at each arraycopy call (over 50 instructions) is worth the simplicity and savings from calling out to a helper to do the same thing. I’m willing to let the current inline design go through now, but I would like a helper approach evaluated for the future. This might be fine for a method with one or two arraycopies, but if there are several arraycopies in a method (and we do see methods like that) the code footprint will add up quickly.

a7ehuo commented 1 month ago

I'm currently running sanity test on the changes that address the above comments. I'll push another update once the sanity test looks good

a7ehuo commented 1 month ago

@0xdaryl Comments are addressed. Ready for another review. Thank you!

0xdaryl commented 1 month ago

@BradleyWood : please approve if your comments are addressed

a7ehuo commented 1 month ago

Can you confirm you tested this code internally on both 64-bit and 32-bit?

I tested only on 64-bit machines since I thought the enhancement was guarded with checking if target is 64-bit. I'll test on 32-bit as well after addressing the latest comments.

bool enable8BitPrimitiveArrayCopyInlineSmallSizeWithoutREPMOVS = ((elementSize == 1) &&
                                                                     !disableEnhancement &&
                                                                     cg->comp()->target().cpu.supportsAVX() &&
                                                                     cg->comp()->target().is64Bit()) ? true : false;
bool enable16BitPrimitiveArrayCopyInlineSmallSizeWithoutREPMOVS = (!disableEnhancement &&
                                                                 cg->comp()->target().cpu.supportsAVX() &&
                                                                 cg->comp()->target().is64Bit()) ? true : false;
a7ehuo commented 1 month ago

@0xdaryl Addressed all review comments. Ready for another review. Thank you!

0xdaryl commented 1 month ago

@vijaysun-omr : FYI

vijaysun-omr commented 1 month ago

Since we want to keep all codegens aware of key changes in one, I thought I'd mention @zl-wang @r30shah and @knn-k here so that they can consider whatever feels relevant to their platform

knn-k commented 1 month ago

AArch64 codegen inlines arraycopy only when the copy length is a constant at the compile time. Otherwise it generates instructions to call arraycopy helpers in https://github.com/eclipse/omr/blob/master/compiler/aarch64/runtime/ARM64arrayCopy.spp Maybe I can add special helpers for arrays of short/int/long types that skip the checks for lower bits of the copy length.