Open glaubitz opened 3 years ago
It looks like we're missing support for AtomicLoad
and AtomicStore
. Since m68k has support for Compare-And-Swap (CAS
) and (CAS2
), it should be straight-forward to implement it.
For 1, 2 and 4 byte atomic loads and stores the normal move.x
instructions seem to be sufficient and no barriers needed regardless of the ordering requested. I assume being a very simple CPU the memory model for m68k is strongly ordered, that or it just never mattered because the chips were never SMP?
root@epyc:~> echo 'void foo(int *x, int y) { __atomic_store_4(x, y, __ATOMIC_SEQ_CST); }' | m68k-linux-gnu-gcc -x c - -o - -S -O2
#NO_APP
.file ""
.text
.align 2
.globl foo
.type foo, @function
foo:
move.l 4(%sp),%a0
move.l 8(%sp),(%a0)
rts
.size foo, .-foo
.ident "GCC: (Debian 10.2.0-9) 10.2.0"
.section .note.GNU-stack,"",@progbits
root@epyc:~> echo 'int foo(int *x) { return __atomic_load_4(x, __ATOMIC_SEQ_CST); }' | m68k-linux-gnu-gcc -x c - -o - -S -O2
#NO_APP
.file ""
.text
.align 2
.globl foo
.type foo, @function
foo:
move.l 4(%sp),%a0
move.l (%a0),%d0
rts
.size foo, .-foo
.ident "GCC: (Debian 10.2.0-9) 10.2.0"
.section .note.GNU-stack,"",@progbits
Hmm, that's interesting. I would have expected the use of cas
but it's probably required for AtomicStore
only?
And we probably need to implement a 64-bit AtomicLoad as well, don't we?
I guess we can peek at GCC's libatomic.
Andreas pointed me to this code in the glibc:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/m68k/m680x0/m68020/atomic-machine.h
But I'm currently having a hard time to find the equivalent parts for atomic operations in LLVM.
I guess it would have to be something like this (from X86/X86InstrCompiler.td):
def : Pat<(atomic_store_8 addr:$dst, (i8 imm:$src)),
(MOV8mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_16 addr:$dst, (i16 imm:$src)),
(MOV16mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_32 addr:$dst, (i32 imm:$src)),
(MOV32mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_64 addr:$dst, (i64immSExt32:$src)),
(MOV64mi32 addr:$dst, i64immSExt32:$src)>;
def : Pat<(atomic_store_8 addr:$dst, GR8:$src),
(MOV8mr addr:$dst, GR8:$src)>;
def : Pat<(atomic_store_16 addr:$dst, GR16:$src),
(MOV16mr addr:$dst, GR16:$src)>;
def : Pat<(atomic_store_32 addr:$dst, GR32:$src),
(MOV32mr addr:$dst, GR32:$src)>;
def : Pat<(atomic_store_64 addr:$dst, GR64:$src),
(MOV64mr addr:$dst, GR64:$src)>;
def : Pat<(i8 (atomic_load_8 addr:$src)), (MOV8rm addr:$src)>;
def : Pat<(i16 (atomic_load_16 addr:$src)), (MOV16rm addr:$src)>;
def : Pat<(i32 (atomic_load_32 addr:$src)), (MOV32rm addr:$src)>;
def : Pat<(i64 (atomic_load_64 addr:$src)), (MOV64rm addr:$src)>;
I don't fully understand however what the suffixes "imm" and "rm" refer to. I assume that's "immediate memory" and "register memory", but on M68k, we have `MOV8pd
, for example:
// M <- R
def MOV8fd : MxMove_MR<MxType8.FOp, MxType8.FPat, MxType8d,
MxMoveEncoding<MxMoveSize8,
/*src*/ MxEncEAd_1, MxExtEmpty,
/*dst*/ MxEncEAf_0, MxExtBrief_0>>;
def MOV8pd : MxMove_MR<MxType8.POp, MxType8.PPat, MxType8d,
MxMoveEncoding<MxMoveSize8,
/*src*/ MxEncEAd_1, MxExtEmpty,
/*dst*/ MxEncEAp_0, MxExtI16_0>>;
def MOV8ed : MxMove_MR<MxType8.EOp, MxType8.EPat, MxType8d,
MxMoveEncoding<MxMoveSize8,
/*src*/ MxEncEAd_1, MxExtEmpty,
/*dst*/ MxEncEAe_0, MxExtEmpty>>;
def MOV8od : MxMove_MR<MxType8.OOp, MxType8.OPat, MxType8d,
MxMoveEncoding<MxMoveSize8,
/*src*/ MxEncEAd_1, MxExtEmpty,
/*dst*/ MxEncEAo_0, MxExtEmpty>>;
def MOV8bd : MxMove_MR<MxType8.BOp, MxType8.BPat, MxType8d,
MxMoveEncoding<MxMoveSize8,
/*src*/ MxEncEAd_1, MxExtEmpty,
/*dst*/ MxEncEAb, MxExtI32_0>>;
def MOV8jd : MxMove_MR<MxType8.JOp, MxType8.JPat, MxType8d,
MxMoveEncoding<MxMoveSize8,
/*src*/ MxEncEAd_1, MxExtEmpty,
/*dst*/ MxEncEAj_0, MxExtEmpty>>;
I think it should be very trivial to implement when one knows what each MOV
pattern exactly means. Although x86 has quite a few more atomic load and store operations, but that is probably the result of the architecture being more complex and powerful.
Yeah i is immediate, m is memory, r is register in the X86 instruction names.
OK. Now we just need to find out what the pd
, bd
etc suffixes mean for M68k ;).
Thanks. The picture becomes clearer now.
So it seems we need will need to define atomic_store_8/16/32
on one hand and also properly configure in M68kISelLowering.cpp
which atomic operations and sizes are supported as done here for SPARC:
// ATOMICs.
// Atomics are supported on SparcV9. 32-bit atomics are also
// supported by some Leon SparcV8 variants. Otherwise, atomics
// are unsupported.
if (Subtarget->isV9())
setMaxAtomicSizeInBitsSupported(64);
else if (Subtarget->hasLeonCasa())
setMaxAtomicSizeInBitsSupported(32);
else
setMaxAtomicSizeInBitsSupported(0);
setMinCmpXchgSizeInBits(32);
setOperationAction(ISD::ATOMIC_SWAP, MVT::i32, Legal);
setOperationAction(ISD::ATOMIC_FENCE, MVT::Other, Legal);
// Custom Lower Atomic LOAD/STORE
setOperationAction(ISD::ATOMIC_LOAD, MVT::i32, Custom);
setOperationAction(ISD::ATOMIC_STORE, MVT::i32, Custom);
if (Subtarget->is64Bit()) {
setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i64, Legal);
setOperationAction(ISD::ATOMIC_SWAP, MVT::i64, Legal);
setOperationAction(ISD::ATOMIC_LOAD, MVT::i64, Custom);
setOperationAction(ISD::ATOMIC_STORE, MVT::i64, Custom);
}
plus the mapping of the generic ISD::ATOMIC_LOAD/STORE
into the architecture-specific one:
case ISD::ATOMIC_LOAD:
case ISD::ATOMIC_STORE: return LowerATOMIC_LOAD_STORE(Op, DAG);
I just tried compiling a little more complex piece of C++ code which fails with an error code generated by the backend:
It looks like we are missing the pieces to implement C++11 atomics. Attaching the source code file as well as the pre-processed sources, see: localscope-sources.zip