Compiling complex C++ code fails with "fatal error: error in backend: Cannot select: 0x555ec252e008": i8,ch = AtomicLoad

glaubitz commented 3 years ago

I just tried compiling a little more complex piece of C++ code which fails with an error code generated by the backend:

glaubitz@epyc:..openscad-2019.05/src> /local_scratch/glaubitz/M680x0-mono-repo/build/bin/clang -target m68k-linux-gnu localscope.cc -o localscope.o -I /usr/m68k-linux-gnu/include/c++/10/m68k-linux-gnu/ -I /usr/m68k-linux-gnu/include/ -I /usr/include/glib-2.0 -I /usr/lib/m68k-linux-gnu/glib-2.0/include/
fatal error: error in backend: Cannot select: 0x55f0fc5b1ea8: i8,ch = AtomicLoad<(dereferenceable load acquire 1 from `i8* bitcast (i64* @_ZGVZN5boost6system6detail15to_std_categoryERKNS0_14error_categoryEE4map_ to i8*)`, align 4)> 0x55f0fc31e988, 0x55f0fc5ae8c8
  0x55f0fc5ae8c8: i32 = M680x0ISD::WrapperPC TargetGlobalAddress:i32<i64* @_ZGVZN5boost6system6detail15to_std_categoryERKNS0_14error_categoryEE4map_> 0
    0x55f0fc5aec70: i32 = TargetGlobalAddress<i64* @_ZGVZN5boost6system6detail15to_std_categoryERKNS0_14error_categoryEE4map_> 0
In function: _ZN5boost6system6detail15to_std_categoryERKNS0_14error_categoryE
clang-12: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 12.0.0 (git@github.com:M680x0/M680x0-mono-repo.git c5834ffbda019df8c94c669c658d804cb9c19af3)
Target: m68k-unknown-linux-gnu
Thread model: posix
InstalledDir: /local_scratch/glaubitz/M680x0-mono-repo/build/bin
clang-12: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-12: note: diagnostic msg: /tmp/localscope-405c82.cpp
clang-12: note: diagnostic msg: /tmp/localscope-405c82.sh
clang-12: note: diagnostic msg: 

********************
glaubitz@epyc:..openscad-2019.05/src>

It looks like we are missing the pieces to implement C++11 atomics. Attaching the source code file as well as the pre-processed sources, see: localscope-sources.zip

glaubitz commented 3 years ago

It looks like we're missing support for AtomicLoad and AtomicStore. Since m68k has support for Compare-And-Swap (CAS) and (CAS2), it should be straight-forward to implement it.

jrtc27 commented 3 years ago

For 1, 2 and 4 byte atomic loads and stores the normal move.x instructions seem to be sufficient and no barriers needed regardless of the ordering requested. I assume being a very simple CPU the memory model for m68k is strongly ordered, that or it just never mattered because the chips were never SMP?

jrtc27 commented 3 years ago

root@epyc:~> echo 'void foo(int *x, int y) { __atomic_store_4(x, y, __ATOMIC_SEQ_CST); }' | m68k-linux-gnu-gcc -x c - -o - -S -O2 
#NO_APP
    .file   ""
    .text
    .align  2
    .globl  foo
    .type   foo, @function
foo:
    move.l 4(%sp),%a0
    move.l 8(%sp),(%a0)
    rts
    .size   foo, .-foo
    .ident  "GCC: (Debian 10.2.0-9) 10.2.0"
    .section    .note.GNU-stack,"",@progbits
root@epyc:~> echo 'int foo(int *x) { return __atomic_load_4(x, __ATOMIC_SEQ_CST); }' | m68k-linux-gnu-gcc -x c - -o - -S -O2 
#NO_APP
    .file   ""
    .text
    .align  2
    .globl  foo
    .type   foo, @function
foo:
    move.l 4(%sp),%a0
    move.l (%a0),%d0
    rts
    .size   foo, .-foo
    .ident  "GCC: (Debian 10.2.0-9) 10.2.0"
    .section    .note.GNU-stack,"",@progbits

glaubitz commented 3 years ago

Hmm, that's interesting. I would have expected the use of cas but it's probably required for AtomicStore only?

And we probably need to implement a 64-bit AtomicLoad as well, don't we?

I guess we can peek at GCC's libatomic.

glaubitz commented 3 years ago

Andreas pointed me to this code in the glibc:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/m68k/m680x0/m68020/atomic-machine.h

But I'm currently having a hard time to find the equivalent parts for atomic operations in LLVM.

glaubitz commented 3 years ago

I guess it would have to be something like this (from X86/X86InstrCompiler.td):

def : Pat<(atomic_store_8 addr:$dst, (i8 imm:$src)),
          (MOV8mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_16 addr:$dst, (i16 imm:$src)),
          (MOV16mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_32 addr:$dst, (i32 imm:$src)),
          (MOV32mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_64 addr:$dst, (i64immSExt32:$src)),
          (MOV64mi32 addr:$dst, i64immSExt32:$src)>;

def : Pat<(atomic_store_8 addr:$dst, GR8:$src),
          (MOV8mr addr:$dst, GR8:$src)>;
def : Pat<(atomic_store_16 addr:$dst, GR16:$src),
          (MOV16mr addr:$dst, GR16:$src)>;
def : Pat<(atomic_store_32 addr:$dst, GR32:$src),
          (MOV32mr addr:$dst, GR32:$src)>;
def : Pat<(atomic_store_64 addr:$dst, GR64:$src),
          (MOV64mr addr:$dst, GR64:$src)>;

def : Pat<(i8  (atomic_load_8 addr:$src)),  (MOV8rm addr:$src)>;
def : Pat<(i16 (atomic_load_16 addr:$src)), (MOV16rm addr:$src)>;
def : Pat<(i32 (atomic_load_32 addr:$src)), (MOV32rm addr:$src)>;
def : Pat<(i64 (atomic_load_64 addr:$src)), (MOV64rm addr:$src)>;

I don't fully understand however what the suffixes "imm" and "rm" refer to. I assume that's "immediate memory" and "register memory", but on M68k, we have `MOV8pd, for example:

// M <- R
def MOV8fd : MxMove_MR<MxType8.FOp, MxType8.FPat, MxType8d,
                       MxMoveEncoding<MxMoveSize8,
                            /*src*/   MxEncEAd_1, MxExtEmpty,
                            /*dst*/   MxEncEAf_0, MxExtBrief_0>>;

def MOV8pd : MxMove_MR<MxType8.POp, MxType8.PPat, MxType8d,
                       MxMoveEncoding<MxMoveSize8,
                            /*src*/   MxEncEAd_1, MxExtEmpty,
                            /*dst*/   MxEncEAp_0, MxExtI16_0>>;

def MOV8ed : MxMove_MR<MxType8.EOp, MxType8.EPat, MxType8d,
                       MxMoveEncoding<MxMoveSize8,
                            /*src*/   MxEncEAd_1, MxExtEmpty,
                            /*dst*/   MxEncEAe_0, MxExtEmpty>>;

def MOV8od : MxMove_MR<MxType8.OOp, MxType8.OPat, MxType8d,
                       MxMoveEncoding<MxMoveSize8,
                            /*src*/   MxEncEAd_1, MxExtEmpty,
                            /*dst*/   MxEncEAo_0, MxExtEmpty>>;

def MOV8bd : MxMove_MR<MxType8.BOp, MxType8.BPat, MxType8d,
                       MxMoveEncoding<MxMoveSize8,
                            /*src*/   MxEncEAd_1, MxExtEmpty,
                            /*dst*/   MxEncEAb,   MxExtI32_0>>;

def MOV8jd : MxMove_MR<MxType8.JOp, MxType8.JPat, MxType8d,
                       MxMoveEncoding<MxMoveSize8,
                            /*src*/   MxEncEAd_1, MxExtEmpty,
                            /*dst*/   MxEncEAj_0, MxExtEmpty>>;

I think it should be very trivial to implement when one knows what each MOV pattern exactly means. Although x86 has quite a few more atomic load and store operations, but that is probably the result of the architecture being more complex and powerful.

jrtc27 commented 3 years ago

Yeah i is immediate, m is memory, r is register in the X86 instruction names.

glaubitz commented 3 years ago

OK. Now we just need to find out what the pd, bd etc suffixes mean for M68k ;).

jrtc27 commented 3 years ago

https://github.com/M680x0/M680x0-mono-repo/blob/39f5403bf77de2b1682e876321b1fbf60c511c9b/llvm/lib/Target/M68k/M68kInstrFormats.td#L13-L40

glaubitz commented 3 years ago

Thanks. The picture becomes clearer now.

So it seems we need will need to define atomic_store_8/16/32 on one hand and also properly configure in M68kISelLowering.cpp which atomic operations and sizes are supported as done here for SPARC:

  // ATOMICs.                                                                                                                                                              
  // Atomics are supported on SparcV9. 32-bit atomics are also                                                                                                             
  // supported by some Leon SparcV8 variants. Otherwise, atomics                                                                                                           
  // are unsupported.                                                                                                                                                      
  if (Subtarget->isV9())
    setMaxAtomicSizeInBitsSupported(64);
  else if (Subtarget->hasLeonCasa())
    setMaxAtomicSizeInBitsSupported(32);
  else
    setMaxAtomicSizeInBitsSupported(0);

  setMinCmpXchgSizeInBits(32);

  setOperationAction(ISD::ATOMIC_SWAP, MVT::i32, Legal);

  setOperationAction(ISD::ATOMIC_FENCE, MVT::Other, Legal);

  // Custom Lower Atomic LOAD/STORE                                                                                                                                        
  setOperationAction(ISD::ATOMIC_LOAD, MVT::i32, Custom);
  setOperationAction(ISD::ATOMIC_STORE, MVT::i32, Custom);

  if (Subtarget->is64Bit()) {
    setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i64, Legal);
    setOperationAction(ISD::ATOMIC_SWAP, MVT::i64, Legal);
    setOperationAction(ISD::ATOMIC_LOAD, MVT::i64, Custom);
    setOperationAction(ISD::ATOMIC_STORE, MVT::i64, Custom);
  }

plus the mapping of the generic ISD::ATOMIC_LOAD/STORE into the architecture-specific one:

  case ISD::ATOMIC_LOAD:
  case ISD::ATOMIC_STORE:       return LowerATOMIC_LOAD_STORE(Op, DAG);

M680x0 / M680x0-mono-repo

Compiling complex C++ code fails with "fatal error: error in backend: Cannot select: 0x555ec252e008": i8,ch = AtomicLoad #13