Open 0dvictor opened 7 years ago
So, we have documentation of previous thought that declined to go down this road; I'm curious about what has changed from the reasoning in that document (admittedly, it's relatively old at this point).
Can you provide a deeper justification for requiring these new IL opcodes? Addressing @mgaudet's point is important as well.
Also, even though you yourself may not be providing a complete implementation across all platforms supported by the Eclipse OMR compiler (i.e., x86 only) if the proposal does go ahead in its current form I would like to see some tracking of the work to be completed on ARM, P, and Z so that it isn't forgotten. An issue should suffice.
I apology for the confusion caused by my wording. I should have said to introduce common helpers to support atomic operations on 32-bit and 64-bit primitive types.
I am actually on the side not to introduce new IL codes. These atomic operations are hard to transform by optimizer and the optimizer probably should not modify them. Hence my plan is, as the document mentioned, to utilize “call” to common “runtime routines”; the “runtime routines” will go through a route same as J9’s Codegen inlining recognized methods. One existing example in OMR is SinglePrecisionSQRT.
I’m updating the issue to clarify the confusion.
The x86 PR will be merged soon pending minor changes. I suggest leaving this issue open until implementation is completed on all platforms supported by Eclipse OMR (Z, Power, ARM) so the lack of support there isn't forgotten. @fjeremic @ymanton
Is it possible to provide unit tests for these newly added inline codegen helpers? This would help to validate the exploitation of such helpers when they are ported to the platforms @0xdaryl listed above, in addition to validate the code being checked in for x86.
Here's an example of one particular use of atomicAddInt:
============================================================
; Live regs: GPR=1 FPR=0 {&GPR_0066}
------------------------------
n152n ( 0) treetop [0x0007133880] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=- nc=1
n148n ( 1) icall <atomicAddInt>[#585 helper Method] [flags 0x400 0x0 ] () [0x00071337a0] bci=[-1,0,-] rc=1 vc=666 vn=- li=42 udi=- nc=3 flg=0x20
n65n ( 2) ==>acall (in &GPR_0066) (X!=0 )
n136n ( 1) iconst 8 [0x0007133500] bci=[-1,0,-] rc=1 vc=666 vn=- li=42 udi=- nc=0
n142n ( 1) iconst 4 [0x0007133650] bci=[-1,0,-] rc=1 vc=666 vn=- li=42 udi=- nc=0
------------------------------
------------------------------
n152n ( 0) treetop [0x0007133880] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=- nc=1
n148n ( 0) icall <atomicAddInt>[#585 helper Method] [flags 0x400 0x0 ] (in GPR_0081) () [0x00071337a0] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=38064 nc=3 flg=0x20
n65n ( 1) ==>acall (in &GPR_0066) (X!=0 )
n136n ( 0) iconst 8 (in GPR_0080) [0x0007133500] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=37792 nc=0
n142n ( 0) iconst 4 (in GPR_0081) [0x0007133650] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=38064 nc=0
------------------------------
[0x0007339430] mov GPR_0080, 0x00000008 # MOV4RegImm4
[0x0007339580] mov GPR_0081, 0x00000004 # MOV4RegImm4
[0x0007339690] lock add dword ptr [&GPR_0066+1*GPR_0080], GPR_0081 # LADD4MemReg
Which ultimately turns into:
0x7f7411351070 0000003c [0x0007339430] b9 08 00 00 00 mov ecx, 0x00000008 # MOV4RegImm4
0x7f7411351075 00000041 [0x0007339580] b8 04 00 00 00 mov eax, 0x00000004 # MOV4RegImm4
0x7f741135107a 00000046 [0x0007339690] f0 41 01 04 0a lock add dword ptr [r10+1*rcx],eax # LADD4MemReg
@0dvictor Do you think we could do any better on the encoding in this case? It seems wasteful to burn two registers for the constants (especially given r10 is originally a call return value, which means it was originally in rax and needs to survive this sequence).
@mstoodle Thank you for pointing this out. Yes, we definitely can do better; however I'd like to keep the initial version as simple as possible. More optimized versions, including resolving this situation, are in-plan and will be committed once they are ready.
Multi-threading plays an important role in modern programs, and hence atomic operations are in-demand. I am proposing to introduce common helpers to support following three atomic operations on 32-bit and 64-bit primitive types:
Currently OMR does not provide support for atomic operations, J9 implements its own by recognizing atomic operation methods and then letting Codegen inline them. Initially demanded by JITBuilder, many languages' front ends very-likely also requires atomic operations on primitives; for instance, languages that utilize reference count for memory managements.
I believe these atomic operations should only be bounded by the back end and hence we should bring these helpers to OMR.
Similar to J9, I plan to introduce one helper per operation per type. JITBuilder and other language implementations can attach the helper to a “call” and OMR’s codegen will generate inlined assemblies accordingly.
The initial effort will include building common infrastructure and x86 codegen support. Once the common part is done ARM, P, and Z could easily add support as well.
** UPDATED ON 2016-11-04. -------- PREVIOUS VERSION -------- Multi-threading plays an important role in modern programs, and hence atomic operations are in-demand. I am proposing to introduce following three atomic operations on 32-bit and 64-bit primitive types:
The initial effort will include building common infrastructure and x86 codegen support.