eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
943 stars 394 forks source link

Support for Atomic Operation on Primitive Types #426

Open 0dvictor opened 7 years ago

0dvictor commented 7 years ago

Multi-threading plays an important role in modern programs, and hence atomic operations are in-demand. I am proposing to introduce common helpers to support following three atomic operations on 32-bit and 64-bit primitive types:

  1. atomic add
  2. atomic swap
  3. atomic fetch and add

Currently OMR does not provide support for atomic operations, J9 implements its own by recognizing atomic operation methods and then letting Codegen inline them. Initially demanded by JITBuilder, many languages' front ends very-likely also requires atomic operations on primitives; for instance, languages that utilize reference count for memory managements.

I believe these atomic operations should only be bounded by the back end and hence we should bring these helpers to OMR.

Similar to J9, I plan to introduce one helper per operation per type. JITBuilder and other language implementations can attach the helper to a “call” and OMR’s codegen will generate inlined assemblies accordingly.

The initial effort will include building common infrastructure and x86 codegen support. Once the common part is done ARM, P, and Z could easily add support as well.

** UPDATED ON 2016-11-04. -------- PREVIOUS VERSION -------- Multi-threading plays an important role in modern programs, and hence atomic operations are in-demand. I am proposing to introduce following three atomic operations on 32-bit and 64-bit primitive types:

  1. atomic add
  2. atomic exchange
  3. atomic exchange and add

The initial effort will include building common infrastructure and x86 codegen support.

mgaudet commented 7 years ago

So, we have documentation of previous thought that declined to go down this road; I'm curious about what has changed from the reasoning in that document (admittedly, it's relatively old at this point).

0xdaryl commented 7 years ago

Can you provide a deeper justification for requiring these new IL opcodes? Addressing @mgaudet's point is important as well.

Also, even though you yourself may not be providing a complete implementation across all platforms supported by the Eclipse OMR compiler (i.e., x86 only) if the proposal does go ahead in its current form I would like to see some tracking of the work to be completed on ARM, P, and Z so that it isn't forgotten. An issue should suffice.

0dvictor commented 7 years ago

I apology for the confusion caused by my wording. I should have said to introduce common helpers to support atomic operations on 32-bit and 64-bit primitive types.

I am actually on the side not to introduce new IL codes. These atomic operations are hard to transform by optimizer and the optimizer probably should not modify them. Hence my plan is, as the document mentioned, to utilize “call” to common “runtime routines”; the “runtime routines” will go through a route same as J9’s Codegen inlining recognized methods. One existing example in OMR is SinglePrecisionSQRT.

I’m updating the issue to clarify the confusion.

0xdaryl commented 7 years ago

The x86 PR will be merged soon pending minor changes. I suggest leaving this issue open until implementation is completed on all platforms supported by Eclipse OMR (Z, Power, ARM) so the lack of support there isn't forgotten. @fjeremic @ymanton

fjeremic commented 7 years ago

Is it possible to provide unit tests for these newly added inline codegen helpers? This would help to validate the exploitation of such helpers when they are ported to the platforms @0xdaryl listed above, in addition to validate the code being checked in for x86.

mstoodle commented 7 years ago

Here's an example of one particular use of atomicAddInt:

============================================================
; Live regs: GPR=1 FPR=0 {&GPR_0066}
------------------------------
 n152n    (  0)  treetop                                                                              [0x0007133880] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=- nc=1
 n148n    (  1)    icall  <atomicAddInt>[#585  helper Method] [flags 0x400 0x0 ] ()                   [0x00071337a0] bci=[-1,0,-] rc=1 vc=666 vn=- li=42 udi=- nc=3 flg=0x20
 n65n     (  2)      ==>acall (in &GPR_0066) (X!=0 )
 n136n    (  1)      iconst 8                                                                         [0x0007133500] bci=[-1,0,-] rc=1 vc=666 vn=- li=42 udi=- nc=0
 n142n    (  1)      iconst 4                                                                         [0x0007133650] bci=[-1,0,-] rc=1 vc=666 vn=- li=42 udi=- nc=0
------------------------------
------------------------------
 n152n    (  0)  treetop                                                                              [0x0007133880] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=- nc=1
 n148n    (  0)    icall  <atomicAddInt>[#585  helper Method] [flags 0x400 0x0 ] (in GPR_0081) ()     [0x00071337a0] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=38064 nc=3 flg=0x20
 n65n     (  1)      ==>acall (in &GPR_0066) (X!=0 )
 n136n    (  0)      iconst 8 (in GPR_0080)                                                           [0x0007133500] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=37792 nc=0
 n142n    (  0)      iconst 4 (in GPR_0081)                                                           [0x0007133650] bci=[-1,0,-] rc=0 vc=666 vn=- li=42 udi=38064 nc=0
------------------------------

 [0x0007339430] mov GPR_0080, 0x00000008    # MOV4RegImm4
 [0x0007339580] mov GPR_0081, 0x00000004    # MOV4RegImm4
 [0x0007339690] lock add    dword ptr [&GPR_0066+1*GPR_0080], GPR_0081      # LADD4MemReg

Which ultimately turns into:

0x7f7411351070 0000003c [0x0007339430] b9 08 00 00 00                       mov ecx, 0x00000008 # MOV4RegImm4
0x7f7411351075 00000041 [0x0007339580] b8 04 00 00 00                       mov eax, 0x00000004 # MOV4RegImm4
0x7f741135107a 00000046 [0x0007339690] f0 41 01 04 0a                       lock add    dword ptr [r10+1*rcx],eax       # LADD4MemReg

@0dvictor Do you think we could do any better on the encoding in this case? It seems wasteful to burn two registers for the constants (especially given r10 is originally a call return value, which means it was originally in rax and needs to survive this sequence).

0dvictor commented 7 years ago

@mstoodle Thank you for pointing this out. Yes, we definitely can do better; however I'd like to keep the initial version as simple as possible. More optimized versions, including resolving this situation, are in-plan and will be committed once they are ready.