Closed zRedShift closed 4 weeks ago
Likely related to a scheduling issue I've had a while back.
Basically, the MPMC queue used for the scheduler is implemented as a ringbuffer using atomic operations, and that ringbuffer has 8 bit (i.e. 256) indices. Now presumably through some atomics bug, wrapping arithmetic bug or a bug in the implementation, the ringbuffer never wraps around, leaving the queue in the full state indefinitely. So no more tasks can be enqueued and so the timer stops working.
Initially, @ivmarkov solved it by using the crossbeam_queue::ArrayQueue
as the mpmc queue instead. But I think, more likely is that the issue just got delayed since crossbeams's ArrayQueue
uses 32 bit indices and its implementation is based on the same source (i.e. Dmitry Vyukov's bounded MPMC queue). With 32 bit indices the mpmc queue's index would need 2^24 times longer to reach u32::MAX
, though I've not tested it if the same wrap-around issue also happens with bigger indices. (Note also that the issue also "goes away" if heapless's feature mpmc_large
is used).
Since heapless::mpmc::MpMcQueue
is the mpmc queue used by default in edge-executor as its crossbeam-queue
feature is disabled by default with default-features = false
in esp-idf-hal, I think this is indeed the same issue.
My issue occurred regardless of LTO mode, and someone (I can't remember who) tested it on the esp32c3 where it worked fine. Also note, that this issue even occurred in wokwi for me, so unlikely to be a hardware bug.
@N3xed Do you know where the original discussion for the MPMC issue exists? I can't seem to find it across the various esp-rs repos. Might be helpful in producing a smaller example, especially if it was happening without LTO.
@N3xed Do you know where the original discussion for the MPMC issue exists? I can't seem to find it across the various esp-rs repos. Might be helpful in producing a smaller example, especially if it was happening without LTO.
Unfortunately, I've never created an issue for this, so this is the only place as far as I know. When I discovered this issue I worked around it, went on to other stuff, and never got around to it until @zRedShift brought it up in the matrix chat.
There is my discussion about it with @ivmarkov in the matrix chat though.
@MabezDev @N3xed @ivmarkov
I significantly reduced the minimal reproduction and moved it to no_std
. I also updated the README.md
and added the disassembly diff (sort of).
#![no_std]
#![no_main]
use esp_backtrace as _;
#[xtensa_lx_rt::entry]
fn main() -> ! {
let queue = heapless::mpmc::MpMcQueue::<(), 8>::new();
let mut counter = 0;
loop {
esp_println::println!("counter at {counter}");
queue.enqueue(()).unwrap();
queue.dequeue().unwrap();
counter += 1;
}
}
Please look for more at: https://github.com/zRedShift/esp-spinlock-repro
@MabezDev this is getting to the limit of what I can do, but compiling the following branch: https://github.com/zRedShift/esp-spinlock-repro/blob/without-heapless/src/main.rs (without heapless
) with thin and fat LTO and comparing the disassembly (the decompiled C code seemingly is the exact same), we have only a minimal set of changes:
Besides label, cross-references and debug-info, the only difference is the register allocation, which after equalizing the labels comes down to this diff:
79c79
< 4200379f 62 0f 00 l8ui a6,a15,0x0
---
> 4200379f f2 0f 00 l8ui a15,a15,0x0
81c81
< 420037a5 87 96 44 bne a6,a8,LAB_420037ed
---
> 420037a5 87 9f 44 bne a15,a8,LAB_420037ed
89c89
< 420037bc b0 58 30 xor a5,a8,a11
---
> 420037bc b0 68 30 xor a6,a8,a11
91,93c91,93
< 420037c1 50 28 10 and a2,a8,a5
< 420037c4 00 4c a1 sll a4,a12
< 420037c7 00 66 a1 sll a6,a6
---
> 420037c1 60 28 10 and a2,a8,a6
> 420037c4 00 5c a1 sll a5,a12
> 420037c7 00 4e a1 sll a4,a14
96,97c96,97
< 420037cc 80 24 20 or a2,a4,a8
< 420037cf 80 36 20 or a3,a6,a8
---
> 420037cc 80 25 20 or a2,a5,a8
> 420037cf 80 34 20 or a3,a4,a8
100c100
< 420037d8 50 23 10 and a2,a3,a5
---
> 420037d8 60 23 10 and a2,a3,a6
108c108
< 420037ed e0 86 c0 sub a8,a6,a14
---
> 420037ed e0 8f c0 sub a8,a15,a14
Narrowed it down to 4200052f 00 4e a1 sll a4,a14
in the fat lto file
it should be 4200052f 00 4f a1 sll a4,a15
, right before the CAS loop.
I ported your code to https://github.com/esp-rs/esp-hal/compare/main...MabezDev:esp-hal:esp-spinlock-repro and retried (all examples are compiled with lto='fat'). This bug seems to be resolved.
I tried this code (repo: esp-spinlock-repro):
When I'm running code I've compiled with "fat" LTO, on 1.64.0.0 or 1.65.0.0, with the
xtensa-esp32s3-espidf
target on my ESP32S3, regardless of theesp-idf
branch, my code hangs indefinitely after 256 iteratiots:If I disable LTO or move to "thin" LTO, the code keeps chugging along with no end in sight (what I expect to see).
Meta
rustc --version --verbose
: