Error lifting LDAXR (load-acquire / store-release exclusive monitors)

l-kent commented 3 weeks ago

An error occurs with instruction ldaxr x8, [x11] with opcode 0xc85ffd68:

ASLi> :sem A64 0xc85ffd68
Decoding instruction A64 c85ffd68
  Error File "libASL/primops.ml", line 130, characters 4-10: Assertion failed

This is notably different to the error that this opcode causes when lifted via gtirb-semantics: https://github.com/UQ-PAC/aslp/issues/85

l-kent commented 2 weeks ago

A possibly related issue is that the related instruction stlxr evaluates to assert FALSE ;

ASLi> :sem A64 0xc80afd6c
Decoding instruction A64 c80afd6c
assert FALSE ;

katrinafyi commented 2 weeks ago

Possibly.

stlxr fails because it calls IsExclusiveVA() which is stubbed to assert false in memory.asl. I expect ldaxr would fail in a similar way, if it got to that point within SetExclusiveMonitors(). Instead, it gets stuck at translating the virtual address to a physical one to mark the physical addresses.

The assertion which is failing is a non-negativity assertion. It's evaluating to zeros_bits.0 {{ -20 }} ( )... This is obviously wrong. Anyway, we might be able stub out the address translation in a similar way to the non-atomic memory accesses. L-A/S-R semantics are not something I know, though.

The traces for ldaxr and stlxr are: x.txt, y.txt. These can be generated by passing -x 1 to aslp, and larger numbers will produce progressively more output.

l-kent commented 2 weeks ago

Since IsExclusiveVA() is apparently always safe to return true and is an optional implementation-dependent test, shouldn't it be stubbed to true instead of false?

https://github.com/UQ-PAC/aslp/blob/abd63cb88c81f34822dd3d0779517430d0aa9ba8/mra_tools/arch/arch.asl#L11387-L11395

l-kent commented 2 weeks ago

It seems like this is just something that will fall over elsewhere though, as most of the Exclusives monitor-related functions have no provided pseudocode?

katrinafyi commented 2 weeks ago

I don't know. I assume the "always safe to return TRUE" is reliant on a correct implementation of physical exclusivity.

Reading about this, it appears to be a matter of setting/unsetting the monitors, but it is complicated by the fact it must persist between instructions. It might be easier for us to implement the logic entirely in virtual addresses. However, I can't find a "ClearExclusiveVA" to match the MarkExclusiveVA().

The easy fix in ASLp would be to defer this to downstream projects, but I am not sure.

l-kent commented 2 weeks ago

Exclusive accesses for virtual addresses are an entirely optional part of the architecture, without any real spec for how those checks are supposed to behave, so I don't think it would really work for us to just implement that part of it. That seems to be why there isn't a ClearExclusiveVA.

The local monitor doesn't seem too difficult to implement on an abstract level (as long as we ignore address translation as we're already doing everywhere else). It seems to be mostly just a matter of storing an address in some global variable (representing the local monitor) that marks it as exclusive. The trickiest part seems to be that the size of the memory block that is marked as exclusive is implementation-defined and can be anywhere from 16 bytes to 2048 bytes (the least significant 4 to 11 bits of the physical address are ignored to get this).

Unfortunately, the global monitor seems much trickier and is going to be relevant for any real-world use of these instructions. The global monitor only has to be set or checked if a physical address is 'shareable' but that comes from the rather complex address translation process which we would really like to completely ignore. Each core has its own global monitor, and the global monitor for a core is cleared when another core writes to a physical address that overlaps with the address range in the core's global monitor. That part is done by ClearExclusiveByAddress() which is called by most (all?) memory stores.

katrinafyi commented 2 weeks ago

It is tricky. I was thinking that ignoring address translation for "physical" monitors is no more correct than (and functionally identical to) just using virtual monitors throughout. However, you're right that the physical monitors are better specified, so we can go that way.

I was reading this page and it implies that the "eagerness" of monitor clearing (e.g. due to interference by another core) is somewhat arbitrary and implementation-specific. In particular, the monitor may be cleared due to other circumstances and the program code is expected to handle that.

If this is the case, we might be able to pessimise all addresses into shareable global addresses. (This is further justified (weakly) by our interest in weak memory effects which only appear in memory shared across cores).

Re the size of the exclusive block, any correct portable code should behave correctly with the minimum size of exclusion (so the reasoning can soundly assume the minimum size). I don't think this is a big problem.

Taking a step back, this is all very complex. I feel like the ideal approach would be Aslp emitting the monitors as intrinsics, then some analysis to locate the regions delineated by load/store pairs. A naive approach explicitly recording the monitors at each program point for each core sounds very expensive...

l-kent commented 2 weeks ago

I gave it some thought and using intrinsics similar (or even the same as?) the AtomicStart/End ones makes sense to me. The local monitor is fairly easy to sensibly model, and the global monitor is only really supposed to cause the store to fail if another core interferes with it, so the specifics of that can be ignored by ASLp and BASIL can do something with rely/guarantee conditions to model that?

UQ-PAC / aslp

Error lifting LDAXR (load-acquire / store-release exclusive monitors) #84