Enqueue/dequeue operation ordering issue

alainkaegi commented 3 months ago

A C compiler can and will reorder non-volatile operations (even across sequence points). So I'm wondering if the enqueue/dequeue functions (e.g., in sddf/include/sddf/network/queue.h) need annotations to prevent such reorderings (I ran into such an issue in a previous project).

In other words, I'm wondering if a code sequence like

queue->free->buffers[queue->free->tail % queue->size] = buffer;
__asm__ __volatile("":::"memory");
queue->free->tail++;

might be necessary even in the uniprocessor case (the sequence above is gcc specific; a different sequence might be required for a different compiler).

In summary, the potentially necessary change prevents the compiler from reordering some operations; while THREAD_MEMORY_RELEASE() prevents the hardware from doing analogous things at run time.

Courtney3141 commented 3 months ago

We do have issues in the queue library to do with atomics and memory orderings. This is a current work in progress and we plan to increment and read the head and tail index using atomic operations with read aquire semantics. (rather than standalone fences).

However, I think in the example you gave the compiler/CPU would be unable to reorder those two memory operations due to a data dependency of using the tail in the first line.

alainkaegi commented 3 months ago

I'm not a compiler expert, but based on personal experience (I have seen gcc generate code that reordered stores) and on conversations with someone more knowledgeable, I believe prudence is in order.

While there is a dependence that the compiler must honor (use of the original value of tail and its update later on), there is no dependence that compels the compiler to update the content of tail and buffers in a particular order (at least as far as the code snippet we are considering).

Perhaps the following quote from the C specification is the most relevant: "an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries" (C11 specification, section 5.1.2.3 [I'm reading a draft of the standard close to actual publication; I expect the actual standard to be very close]).

In other words, as long as the “observable behavior” remains the same, then the compiler can perform various optimizations. It is as if the optimizer sees our code as equivalent to the following single line:

queue->free->buffers[queue->free->tail++ % queue->size] = buffer;

And in that line of code, an implied order is harder to discern.

Obviously things changes if the accesses are volatile: "at a sequence point all previous accesses to volatile objects have stabilized and no subsequent accesses have occurred" (from the GCC documentation). But that's not currently the case here.

If you change your code to use atomics, then you may introduce "volatile" statements (atomics are often defined as volatile inlined assembly statements) that will prevent the compiler from applying certain optimizations (e.g., store instruction reordering).

wom-bat commented 3 months ago

There needs to be a memory barrier here even on UP to prevent the reordering.

au-ts / sddf

Enqueue/dequeue operation ordering issue #129