Questions regarding the problem with atomics

ch32-rs / qingke

Low level and runtime support for WCH's 32bit QingKe RISC-V MCUs

Apache License 2.0

12 stars 5 forks source link

Questions regarding the problem with atomics #9

Open Thom-de-Jong opened 1 week ago

Thom-de-Jong commented 1 week ago

Good day all,

Not an issue but a question: What exactly is the problem with atomics? I have read the discussion #59

Is this confirmed by WCH, or by the test mentioned in #59? Is this a hardware bug in all chips? ~~How does one know it is the atomic operation causing a problem?~~ Is it enough to put the atomic operation in a critical section by using the portable-atomics crate for example?

Just curious about the recent changes. Thank you in advance for the explanation!

Codetector1374 commented 1 week ago

Yeah, we never really had time to run the machine test suite. But we did run into an actual / real problem with embassy executor. Where the executor enter a bad state, the task is marked as woken up but not in the ready list.

@Dummyc0m and I have reviewed the code in the embassy executor and can not come up with reasonable possibility for us to be entering that state if load reserve store conditional actually work as expected. And by disabling atomic (it will fall back to use the critical section based executor data structures) we can no longer reproduce the problem as expected.

Regarding your question, using portable atomic and placing access into critical section would solve the problem. Because we are practically no long use lr/sc instructions anymore.

If anyone have the time / interest in running through the litmus test suite to actually confirm it would be extra nice.

Usioumeo commented 1 week ago

I tried to run the litmus test suite. The problem is that it is not meant to run on bare metal, and it requires threads and a real operating system underneath. The best I can think of is to parse the litmus tests with a new parser, generate some code (even rust based on this lib, with the test included as embedded assembly), and to simulate traps by threads we could use timer interrupts. The problem with this approach is that interrupts have a priority, so one "thread" will always be interrupted, while the other will never be. We could swap after a test run, but I don't know if that's a good idea for all tests.

Let me know what you think, and if it is worth spend time on it.

Codetector1374 commented 1 week ago

I was thinking adapting something like free rtos and slap it on there as a few threads with preemptive scheduling. I personally don’t really have a huge interest in confirming it is broken, since losing atomic on an embedded & single core platform like this is honestly not a huge deal. And we confirmed by using critical section we can get embassy to work as expected .

Thom-de-Jong commented 1 week ago

@Codetector1374, Thank you for the explanation.

For me, not using atomics solves all the instability I encountered in my firmware on the CH32V203. Atomics make things easy but it is not a dealbreaker as portable-atomics will fill the gap in my case. (And if you are open to depend on an extra crate) I'm not experienced enough to help you further investigate this issue and would have never guessed the cause myself. Huge thanks to you all!