concurrencykit / ck

Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance concurrent systems developed in C99+.
http://concurrencykit.org/
Other
2.35k stars 312 forks source link

ck_ring: ARM64 failure on ck_ring_mpmc #156

Closed sbahra closed 2 years ago

sbahra commented 4 years ago

https://cloud.drone.io/concurrencykit/ck/43/2/2

Good chance it is the regression test itself but concerning as I expected data dependent load on read-side. Requires deeper investigation.

sbahra commented 4 years ago
                            || o->value != o->tid
                            || o->magic != 0xdead
                            || (previous != 0 && previous >= o->value_long)) {
                                ck_error("[0x%p] (%x) (%d, %d) >< (0, %d)\n",
                                        (void *)o, o->magic, o->tid, o->value, size);
                        }
[0x0x1d68b340] (dead) (60368, 60368) >< (0, 2047)
| [0x0x1d682880] (dead) (60369, 60369) >< (0, 2047)
sbahra commented 3 years ago

----[ Testing ring....
--
3534 | make[2]: Entering directory '/drone/src/regressions/ck_ring/validate'
3535 | ./ck_ring_spsc 4 1 2048
3536 | ./ck_ring_spmc 4 1 2048
3537 | SPSC test: done
3538 | SPMC test:
3539 | [0] Observed 49128
3540 | [2] Observed 49128
3541 | [1] Observed 49128
3542 | ./ck_ring_spmc_template 4 1 2048
3543 | SPSC test: done
3544 | SPMC test:
3545 | [0x0xb2ef7a0] (dead) (4763, 4763) >< (0, 2047)
3546 | [0x0xb2ef7c0] (dead) (4764, 4764) >< (0, 2047)
3547 | make[2]: *** [Makefile:12: check] Error 1
3548 | make[2]: Leaving directory '/drone/src/regressions/ck_ring/validate'
michael-grunder commented 3 years ago

Hi @sbahra :wave:

I think I ran into this issue as when building some code on a Mac mini with the new M1 CPU. You requested some assembly in the linked issue so I'll post some here.

The test program:

#include <ck_ring.h>

struct testEntry {
    int value;
};

CK_RING_PROTOTYPE(testEntry, testEntry)

int main(int argc, const char **argv) {
    struct testEntry buffer[4], req;
    ck_ring_t ring;
    int i;

    ck_ring_init(&ring, 4);

    CK_RING_ENQUEUE_MPMC(testEntry, &ring, buffer, &req);
    CK_RING_ENQUEUE_MPMC(testEntry, &ring, buffer, &req);
}

Here's the assembly

For context, it's the second enqueue that blocks on the ARM box, and it specifically blocks on this loop

Let me know if I can help in a different way (e.g. building the assembly differently). This is a rented mac, so if you want I would be happy to add your public key to the box.

cognet commented 3 years ago

Hi @michael-grunder,

I'm really puzzled at this. I would have suspected a barrier issue, except your test isn't even multithreaded, so it can't be the issue. I can't seem to reproduce it on my arm64 box, so I'd be interested in having a shell on your mac.

Thanks !

michael-grunder commented 3 years ago

After stepping through the code on my x86_64 linux box and M1 machine I think the issue is in the CAS

-> 339              if (ck_pr_cas_uint_value(&ring->p_head,
   340                  producer, delta, &producer) == true) {
   341                  break;
   342              }

This doesn't seem to swap anything on the ARM machine.

I'm happy to let you shell into the M1. Can you provide me your public key? Feel free to post it here or send me an email (which is public on my GH account) and I'll send you the details.

michael-grunder commented 3 years ago

I did a bit more investigation and think it is something to do with the compare and swap.

The following program returns 1 on my X86_64 machine, and 0 on the Mac mini with an M1 processor.

#include <ck_pr.h>

static unsigned int head, producer, delta = 1;

int main(void) {
    ck_pr_cas_uint_value(&head, producer, delta, &producer);
    return head;
}

Here is the disassembly for the above program on the M1

Here is the output when I tell clang to generate assembly with -S

Let me know if I can provide anything else.

sbahra commented 3 years ago

@michael-grunder Thank you so much for putting together this data. I don't think this is related to the original regression (which I'm still thinking is more likely to be the test case itself).

Would it be possible to do "make check" in regressions/ck_pr? Does it fail? @cognet following up on this.

michael-grunder commented 3 years ago

@sbahra No worries, thanks for maintaining such a great project!

The tests do fail. I redirected it to a file and zipped up the output.

ck_pr_check.log.gz

cognet commented 3 years ago

Ok I got it, the issue, as stupid as it sounds, was that the assembler intepreted ";" as the beginning of comments, so any instruction after the first one was just ignored, so of course the CAS was not working as expected :) using \n instead seems to fix it, at least I can now run ck_pr regression tests on @michael-grunder's Mac.

michael-grunder commented 3 years ago

This is now working on my end. Thanks again for taking a look!

tillkruss commented 3 years ago

@cognet: This is working in master. Any change you could tag a new version?

sbahra commented 3 years ago

Will tag it this weekend. I think this is unrelated to the test failure I saw though, which is likely a barrier issue in the test itself.

tillkruss commented 3 years ago

@sbahra: Thanks! 🙌

sbahra commented 3 years ago

Sorry for delay, work things came up. I'll have it up tonight.

sbahra commented 3 years ago

0.7.1 has been tagged and released for your convenience. It is marked as a pre-release pending release notes as I free up this week.

@tillkruss @michael-grunder