Open aviansie-ben opened 3 years ago
~Kindly also note that, the power return 1,0,2 (less, zero, gt) as compare result which is not on equal to what memcmp
would do. It is also highly likely that many platform will have optimized instruction to generate -1,0,1 based on compare result.~
See below
@mnalam-p Agreed that that's a bit weird, though orthogonal to the problem I'm describing here. I think there's some history as to why that happened, but I don't remember the reason. I was actually planning to open an issue about that separately later this week as a potential housekeeping issue, though feel free to open one yourself before that if you're so inclined.
@aviansie-ben : what you propose for zero-length arrays sounds reasonable to me.
If there is an IL opcode that could stand a "reboot" then this one might be it, and I support efforts to standardize its semantics and implementation.
@aviansie-ben agreed that it is out of scope for this discussion. I thought it might be useful for layout the discrepancies with the arraycmp.
Also agreed @aviansie-ben's proposal is the most sane one, i.e. array comparison of length 0 should yield equality, similarly to what std::memcmp
would return.
One of the opcodes provided by the OMR compiler is the
arraycmp
opcode, which allows one to effectively perform an inlinememcmp
on two buffers using specialized sequences of instructions. This opcode is currently only generated as a result of optimizations (LoopReducer
in OMR andIdiomRecognition
in OpenJ9 will generate it), but it could potentially be used as part of a recognized method by other downstream consumers of OMR.One issue that I've found with this opcode at present, however, is that the semantics of this opcode when the length is passed in as 0 don't seem to be consistent. The x86 codegen will always return that the buffers are equal. The Power codegen on the other hand seems to assume that the length will never be 0 and will enter a residue comparison loop, continuing to load from memory until it inevitably segfaults or finds two differing bytes. Overall, we seem to be treating this case as some sort of undefined behaviour, although I can't really be certain of that since we don't have a formal specification for our IL.
This does not cause any issues with the current optimizations generating this opcode, since they only act on loops that unconditionally load before performing a bounds check. However, this is quite a subtle and surprising difference from the semantics of
memcmp
, which always returns that the buffers are equal when the length is 0. Considering that x86 always does the expected thing, this could lead to problems with porting downstream consumers of the compiler that use that opcode from x86 to Power.Perhaps we should consider defining this opcode to always return as if the buffers were equal in order to match the behaviour that most programmers would typically expect?