Semantics of {vector, box}-cas! and ftype-lock!

I investigated the error related to {vector, box}-cas! and ftype-lock! in loongarch and riscv backend, and found they also exist in Racket's arm64 backend. I ran the following code:

(compile-interpret-simple #f)
;; Run the given forms n times,
;; stop if one of them returns #f
(define-syntax run-n
  (lambda (stx)
    (syntax-case stx ()
      [(_ n e* ...)
       #`(let loop ([count 0])
           (if (= n count)
               #t
               ;; run each form
               #,(let f ([e* #'(e* ...)])
                   (let ([e (car e*)] [rest (cdr e*)])
                     (if (null? rest)
                         #`(let ([res #,e])
                             (if res
                                 (loop (add1 count))
                                 (printf "error at ~s: ~s~n" count '#,e)))
                         #`(let ([res #,e])
                             (if res
                                 #,(f rest)
                                 (printf "error at ~s: ~s~n" count '#,e))))))))])))

(begin
  (define bx (box 1))
  (run-n 99999999
         (box-cas! bx 1 4)
         (box-cas! bx 4 1)))

The seesion on loongarch is something like:

> (begin
    (define bx (box 1))
    (run-n 99999999
           (box-cas! bx 1 4)
           (box-cas! bx 4 1)))
error at 38204: (box-cas! bx 4 1)
> (begin
      (define bx (box 1))
      (run-n 99999999
             (box-cas! bx 1 4)
             (box-cas! bx 4 1)))
error at 924785: (box-cas! bx 1 4)
> (begin
      (define bx (box 1))
      (run-n 99999999
             (box-cas! bx 1 4)
             (box-cas! bx 4 1)))
error at 206323: (box-cas! bx 1 4)
> (begin
      (define bx (box 1))
      (run-n 99999999
             (box-cas! bx 1 4)
             (box-cas! bx 4 1)))
error at 1700895: (box-cas! bx 1 4)
> (begin
      (define bx (box 1))
      (run-n 99999999
             (box-cas! bx 1 4)
             (box-cas! bx 4 1)))
error at 2369966: (box-cas! bx 4 1)

If we decrease the number of times, we may succeed:

> (begin
    (define bx (box 1))
    (run-n 999999
           (box-cas! bx 1 4)
           (box-cas! bx 4 1)))
#t
> (begin
    (define bx (box 1))
    (run-n 999999
           (box-cas! bx 1 4)
           (box-cas! bx 4 1)))
error at 955950: (box-cas! bx 4 1)
> (begin
    (define bx (box 1))
    (run-n 999999
           (box-cas! bx 1 4)
           (box-cas! bx 4 1)))
#t
> (begin
    (define bx (box 1))
    (run-n 999999
           (box-cas! bx 1 4)
           (box-cas! bx 4 1)))
error at 87584: (box-cas! bx 1 4)

The situation is similar on other two architectures.

The generated machine code under optimization level 3 for (box-cas! bx 1 4) on riscv is:

lr.d           %scratch0, %x28, ()   # load-reserved from the box
bne            %scratch0, %x25, 16   # compare with old value
sc.d           %scratch1, %x28, %x26 # store-conditional
sltiu          %cond, %scratch1, 1   # if %scratch1 == 0, then write 1 into %cond
jal            %real-zero, 8
xor            %cond, %cond, %cond

Offsets are in bytes and all instructions are 4 bytes wide.

The code is run is a single thread, how can it fail? I checked some sites (1 2 and 3), and found that the above phenomenon is called the "spurious failure" of weak compare-and-swap, which is how our cas! is implemented. So even we run the code in a single thread and the given value is equal to the boxed value, and no other cores have touched the addresses reserved by the paired load-reserved, store-conditional will still fail.

For weak CAS, if the store-conditional fails, it will just quit and the set the status flag. In the strong CAS, if the store-conditional fails, it will loop back to the load-reserved/linked, and retry, until the store-conditional succeeds. In x86, CAS is implemented using a single locked-cmpxchg, hence no spurious failure.

In the mats tests 5_6.ms line 1268 and 5_8.ms line 41, the two tests fail/succeed randomly. I suppose the tests assume that the cas! is strong, but the implementation is weak (also in arm32, powerpc and Racket's arm64).

In addition to {vector, box}-cas!, the test ftype.ms line 2790 and 2796 also assume that ftype-lock! acquires the lock "strongly", but in fact the implementation (using store-conditional without retry loop) is "weak". Yet if we change the asm-lock! implementation to contain a retry loop, we are inventing another ftype-spin-lock!.

So maybe we should 1) make the CAS assembler primitives strong by adding loops, and 2) assume that ftype-lock! is weak and modify the tests.

Any other nice ideas?

The original intent when adding cas! was to expose the underlying machine operation, which means allowing spurious failure on architectures that have them. But I didn't document that choice clearly, and as the broken tests illustrate, it's easy to forget about the possibility of spurious failure.

It might be a good idea to make the cas! operations retry on some failures. The result of a cas! operation currently doesn't indicate whether the failure was due to a different value in place than the given one or whether it's due to a (possibly spurious) failure of atomicity. The machine code can make that distinction and retry only in the latter case. Making that distinction within ftype-lock! would make it different than ftype-spin-lock!, still.

I've never tried this change, though, because I'm not sure it's a good idea. An atomicity failure isn't necessarily spurious, and automatically retrying might cause cas! to loop when it should just return failure. Looping might take some choice away from applications. Maybe it could even cause a program to get stuck when it doesn't have to; I'm not sure. C++ provides both string and weak variants, which suggests that the strong operation can be sensible, but also that the weak operation can be useful. Meanwhile, C compiler intrinsics tend to offer only the weak version of this operation; that's an issue, because there are places where those get used within the kernel or in the Chez Scheme C API. See "mkheader.ss" and _InterlockedExchange64 for one example. (I cannot tell whether _InterlockedExchange64 specifically is meant to allow spurious failure, though.)

Either the docs and tests need to be fixed, or the implementation of cas! andftype-lock!` operations needs to change, but I'm not sure which is the best path. I lean toward fixing the documentation and tests, because I think these operations are so low-level and rarely needed that it isn't worth the effort to provide easier-to-use variants.

cisco / ChezScheme

Semantics of {vector, box}-cas! and ftype-lock! #720