Closed sbahra closed 1 day ago
struct ck_stack_entry *stack;
stack = ck_pr_load_ptr(&target->head);
entry->next = stack;
ck_pr_fence_store();
while (ck_pr_cas_ptr_value(&target->head, stack, entry, &stack) == false) {
entry->next = stack;
ck_pr_fence_store();
}
ck_pr_aba_t *stack;
stack = ck_pr_ll(&target->head);
entry->next = stack;
ck_pr_fence_store();
while (ck_pr_cas_sc(&target->head, stack, entry, &stack) == false) {
entry->next = stack;
ck_pr_fence_store();
}
For architectures not supporting LLSC:
ck_pr_ll = {
load(generation);
fence_load();
load(pointer);
}
cas_sc_value = {
cas_ptr_2(...);
}
For architectures supporting LLSC:
ck_pr_ll = ll
cas_sc = sc + load
If we have time, adding native ll/sc to ARM, PPC, RiscV, etc...
If no time, add LL/SC to the common ck_pr.h (#ifdef CK_F_PR_LLSC) and to the Cheri port.
Update data structures using cas_2 for ABA-protection to use the new ck_pr_whatever_t.```
There's a lot of restrictions on HW LL/SC, and it's not obvious how to make sure that compiled code obeys them on all platforms. I think it would make more sense to use HW LL/SC to implement our software LL/SC the same way that we do on CAS_2 platforms.
Thanks for the input. I shared that same concern. My opinion has changed.
For others following along: The motivation for this is to not rely on CAS-based speculation when we don't have to. LL/SC-based platforms lacking double-width historically did not benefit from having ABA-free operations otherwise. It is possible to implement ABA-safety using weak CAS with significant cost to fast path (a la https://pvk.ca/Blog/2015/01/13/lock-free-mutual-exclusion/) but we can't rely on the existing tagging system CAS2 uses. RDCSS seems reasonable but leaves a lot of performance on the floor for LL/SC platforms.
ISTM in practice for the relevant platforms today (looking at Power and ARM) that we can achieve the above interface reliably: 1) For the actual critical section, we can require ck_pr operations if necessary, which meets the requirements of LL/SC (the compiler barriers are needed). ck_pr may not even be necessary in practice. Branching should be fine. See https://github.com/Tr0py/lock-free-stack-arm-asm/blob/master/stack.s and https://developer.arm.com/documentation/ddi0487/latest/ 2) If there is a problem with the above approach, we can also expose instead a ck_pr family of functions for LL/SC-safe family of operations. Hopefully it will not come down to this (edit). 3) Good test coverage for this is reasonable.
@pkhuong Perhaps I'm missing something, but I don't see how we can implement the software LL/SC the same way as CAS2 if we're lacking double-width SC. Were you thinking of something along the lines of RDCSS and given the points above, is it worth it? And if all else fails, I wonder if RDCSS can just be one potential backend for ck_pr_aba_t.
My stance is that we can only reliably use HW LL/SC in (inline) assembly. As soon as regular compiler-generated code can go between the LL and related SC, things are liable to go wrong. If we can't find some software primitive that does what we want for LL/SC, then we should expose a couple data structure -specific helpers, implemented in assembly.
That was what I was hoping to avoid in the past (and situation gets worse with CHERI). Will explore the path above (architectures are a lot more lax these days) and see what limitations we hit. It may be reasonable to implement the software LL/SC only for CHERI with the hardware extensions it provides. Worse comes to worst, falling back to RDCSS would work for the use-case.
@qwattash It'll be useful if we can review the CHERI port before this work is started (we can disable the structures requiring CAS2 initially and merge the port into master). In particular, it may be possible to mitigate the need for the LL/SC-based ck_pr_aba_t using CHERI's extensions.
ARM has double-word LL/SC with ldxp/stxp.
Yes, implemented in https://github.com/concurrencykit/ck/blob/master/include/gcc/aarch64/ck_pr_llsc.h but in general interested in a solution that'll work nicely with CHERI's tagging, Power and Risc V.
I think the issues with CHERI will be incrementally tackled. Currently what I have done is strictly related to what is used by the FreeBSD kernel, so many features are left unused, and I have just some quick fix in the generic ck_pr.h
that replaces __sync_*
with __atomic_*
that our compiler for MIPS and RISC-V understands, this works around everything CheriBSD uses. That said it would be nice to have ck have full support for RISC-V (MIPS will soon be discontinued anyway).
There are two sets of changes that I found so far in arch-independent code:
*_ENTRY()
structures (otherwise we get a capability that can not be used to access the container structure).The RISC-V CHERI implementation has LR/SC and there are no extra restrictions that I'm aware of, although there are no load/store pair instructions (as in plain RISC-V). I expect it to differ from plain RISC-V only in minor assembly instructions and register naming. I started having a look at the ABA-safe pointer, I'll read more about the various things you have linked as I'm not entirely familiar with all the issues. I can probably try to isolate into a fork of CK the changes I have, so we can all have a look more easily.
Thanks for the update @qwattash
@qwattash Let me know if you would like to catch up on this, happy to help or discuss.
@sbahrafreebsd Sorry for the lack of progress, I have been busy in the past months working at other projects, a side-effect of this has been that I have a patch to support the experimental Morello extensions to aarch64. Will come back to this ASAP.
Ok, thanks!
In several places in CK we use a pattern like this:
We can instead encode this using something like:
Where ck_pr_aba_t is an ABA-safe counter. This vastly reduces boiler-plate for these data structures when it comes to portability and makes it easier to implement data structures for things like CHERI.
Data structures using this pattern:
If there is a better name, let's go for it.