dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.53k stars 4.54k forks source link

[RyuJIT/arm32] CQ: enable InterlockedCmpXchg32 intrinsic #9982

Open BruceForstall opened 6 years ago

BruceForstall commented 6 years ago

Currently, ARM does not generate GT_CMPXCHG nodes; all other targets do.

From the importer:

// TODO-ARM-CQ: reenable treating InterlockedCmpXchg32 operation as intrinsic
case CORINFO_INTRINSIC_InterlockedCmpXchg32:

Also, in codegenarmarch.cpp, genCodeForTreeNode:

       case GT_CMPXCHG:
            NYI_ARM("GT_CMPXCHG");

category:cq theme:hardware-intrinsics skill-level:intermediate cost:small

BruceForstall commented 6 years ago

Related: dotnet/runtime#8626

mikedn commented 6 years ago

But does ARM has the necessary instruction(s) to implement this? AFAIR I once looked at this and couldn't find anything clear. And ARM codegenlegacy is also NYI.

mikedn commented 6 years ago

I see that ARM32 GCC generates a call to a library function for std::atomic_compare_exchange_strong/weak.

BruceForstall commented 6 years ago

The right answer very well might be that this is something we won't change. In which case, we can remove these TODOs and NYI. The VM helper must do something, though, so it's a matter of "inline" that, or not.

I'm just auditing existing NYI and splitting them into separate issues.

BruceForstall commented 6 years ago

Also CORINFO_INTRINSIC_InterlockedAdd32, CORINFO_INTRINSIC_InterlockedXAdd32, CORINFO_INTRINSIC_InterlockedXchg32 (GT_LOCKADD, GT_XADD, GT_XCHG). See impIntrinsic().

sdmaclea commented 6 years ago

does ARM have the necessary instruction(s) to implement this?

Arm32 would use a load exclusive, store exclusive sequence similar to Arm64 base implementation.

mikedn commented 6 years ago

Arm32 would use a load exclusive, store exclusive sequence similar to Arm64 base implementation.

Yes, but it seems that ARM32 has different instructions for this - LDREX/STREX instead of LDAXR/STLXR. Also, these instructions are only available starting with ARMv7 and only on some profiles: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/CJHBGBBJ.html. The lack of any acquire/release semantics mention in the documentation probably means that DMB is needed as well.

sdmaclea commented 6 years ago

@mikedn dmb is definitely needed with ldrex/strex.

ARMv7 really introduced multiprocessor coherent ARM. So I would believe ldrex/strex were part of that. It would be reasonable to assume that if we are on a multiprocessor OS, then ldrex, strex and dmb must be present.

I would guess all realistic .NET Core targets have these instructions. We can certainly pick up capabilities from the OS like we do for arm64.

mikedn commented 6 years ago

So if I understand the ARM documentation correctly the generated code should look something like:


; CompareExchange(r0, r1, r2)
    dmb
LOOP:
    ldrex r3, [r0]
    cmp r3, r1
    bne DONE
    strex r4, r2, [r0]
    tst r4, r4
    bne LOOP
DONE:
    dmb
; result in r3