esp-rs / esp-idf-hal

embedded-hal implementation for Rust on ESP32 and ESP-IDF
https://docs.esp-rs.org/esp-idf-hal/
Apache License 2.0
440 stars 169 forks source link

can call `gpio_install_isr_service` multiple times #179

Closed brianmay closed 1 year ago

brianmay commented 1 year ago

If I call PinDriver::subscribe from two different threads, somehow it manages to call gpio_install_isr_service twice:

E (3041) gpio: gpio_install_isr_service(449): GPIO isr service already installed

I think this might be a classic race condition between 2 threads.

It might be better to make enable_isr_service() a public function, that I am required to call at the beginning of the program.

ivmarkov commented 1 year ago

Nope. This code is supposed to be protected with a mutex. And, it might be a much more trivial bug. Let us see what is going on.

ivmarkov commented 1 year ago

OK, I might need your help in that a bit:

brianmay commented 1 year ago

1st question: Yes. Also this is the exact some code path, using the same version, that is being called twice - for two different gpio pins.

2nd question: It solves the problem. In fact, a sleep for 1 second is my current work around.

My previous reading I got confused, and thought it was the AtomicBool that is protecting the block. Looking at the code again, it appears it is should be protected by an CriticalSection, no idea why it isn't working for me.

#[cfg(all(not(feature = "riscv-ulp-hal"), feature = "alloc"))]
static ISR_SERVICE_ENABLED: core::sync::atomic::AtomicBool =
    core::sync::atomic::AtomicBool::new(false);

#[cfg(all(not(feature = "riscv-ulp-hal"), feature = "alloc"))]
static ISR_SERVICE_ENABLED_CS: crate::task::CriticalSection = crate::task::CriticalSection::new();

#[cfg(all(not(feature = "riscv-ulp-hal"), feature = "alloc"))]
fn enable_isr_service() -> Result<(), EspError> {
    use core::sync::atomic::Ordering;

    if !ISR_SERVICE_ENABLED.load(Ordering::SeqCst) {
        let _ = ISR_SERVICE_ENABLED_CS.enter();

        if !ISR_SERVICE_ENABLED.load(Ordering::SeqCst) {
            esp!(unsafe { gpio_install_isr_service(0) })?;

            ISR_SERVICE_ENABLED.store(true, Ordering::SeqCst);
        }
    }

    Ok(())
}

More details on that error:

I (3016) gpio: GPIO[16]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
I (3016) gpio: GPIO[17]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
E (3026) gpio: gpio_install_isr_service(449): GPIO isr service already installed
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: EspError(259)', src/input/esp32/gpio.rs:34:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
abort() was called at PC 0x401520da [_ZN11panic_abort18__rust_start_panic5abort17h6255e7652637169dE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/panic_abort/src/lib.rs:42] on core 1
Backtrace: 0x40082e46 [panic_abort:/home/brian/tree/personal/robotica-remote-rust/.embuild/espressif/esp-idf/release-v4.4/components/esp_system/panic.c:408]:0x3ffd1290 0x40088b45 [esp_system_abort:/home/brian/tree/personal/robotica-remote-rust/.embuild/espressif/esp-idf/release-v4.4/components/esp_system/esp_system.c:128]:0x3ffd12b0 0x4008fa62 [abort:/home/brian/tree/personal/robotica-remote-rust/.embuild/espressif/esp-idf/release-v4.4/components/newlib/abort.c:46]:0x3ffd12d0 0x401520da [_ZN11panic_abort18__rust_start_panic5abort17h6255e7652637169dE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/panic_abort/src/lib.rs:42]:0x3ffd1340 0x401520ce [__rust_start_panic:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/panic_abort/src/lib.rs:37]:0x3ffd1360 0x4013f0b5 [rust_panic:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/panicking.rs:746]:0x3ffd1380 0x4013f038 [_ZN3std9panicking20rust_panic_with_hook17h0159334415928baaE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/panicking.rs:716]:0x3ffd1400 0x4014aff3 [_ZN3std9panicking19begin_panic_handler28_$u7b$$u7b$closure$u7d$$u7d$17h20920f977bf01145E:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/panicking.rs:??]:0x3ffd1470 0x4014ae25 [_ZN3std10sys_common9backtrace26__rust_end_short_backtrace17haefc94d18c22de91E:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:138]:0x3ffd14a0 0x4013eed6 [rust_begin_unwind:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/panicking.rs:584]:0x3ffd14d0 0x40165468 [_ZN4core9panicking9panic_fmt17hb164a27941b51c6eE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/core/src/panicking.rs:142]:0x3ffd1500 0x4016160f [_ZN4core6result13unwrap_failed17h67d207e18fe91769E:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/core/src/result.rs:1785]:0x3ffd1540 0x400edc92 [_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17ha1673847afd65c06E:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/core/src/result.rs:1107]:0x3ffd15a0 0x400edf99 [_ZN15robotica_remote5input5esp324gpio133_$LT$impl$u20$robotica_remote..input..InputPinNotify$u20$for$u20$esp_idf_hal..gpio..PinDriver$LT$T$C$esp_idf_hal..gpio..Input$GT$$GT$14safe_subscribe17h405e484fd40a9e46E:/home/brian/tree/personal/robotica-remote-rust/src/input/esp32/gpio.rs:30]:0x3ffd15d0 0x400e2266 [_ZN15robotica_remote6button4gpio9Debouncer3new28_$u7b$$u7b$closure$u7d$$u7d$17h8bb1552bce3997e8E:/home/brian/tree/personal/robotica-remote-rust/src/button/gpio.rs:90]:0x3ffd1640 0x400ef5cb [_ZN3std6thread7Builder16spawn_unchecked_28_$u7b$$u7b$closure$u7d$$u7d$28_$u7b$$u7b$closure$u7d$$u7d$17h0315a16b9fd6a93bE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/thread/mod.rs:514]:0x3ffd16e0 0x400e1f27 [_ZN3std9panicking3try7do_call17hbec3a8c411a4f26dE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/panicking.rs:492]:0x3ffd1730 0x400d5b1b [_ZN3std5panic12catch_unwind17h041f2208b21c6039E:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/panic.rs:137]:0x3ffd1780 0x400de93d [_ZN3std6thread7Builder16spawn_unchecked_28_$u7b$$u7b$closure$u7d$$u7d$17h7443c4f4dba3898bE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/thread/mod.rs:513]:0x3ffd17d0 0x40141ac7 [_ZN90_$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$9call_once17h0e4dc5c025f5ed33E:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1940]:0x3ffd1820 0x40141b1e [_ZN90_$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$9call_once17hf8d241186835a4edE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1940]:0x3ffd1840 0x4014b128 [_ZN3std3sys4unix6thread6Thread3new12thread_start17h88fc280271ad7d4dE:/home/brian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/sys/unix/thread.rs:108]:0x3ffd1860 0x4016def0 [pthread_task_func:/home/brian/tree/personal/robotica-remote-rust/.embuild/espressif/esp-idf/release-v4.4/components/pthread/pthread.c:195]:0x3ffd1880
ELF file SHA256: 0000000000000000
Rebooting...

It looks like the subscribe function is returning an error, and my code panics when it does so.

ivmarkov commented 1 year ago

I'm sure you have not done this, but to confirm anyway: you have not enabled LTO on your project, right?

ivmarkov commented 1 year ago

I cannot reproduce the case of critical sections not working.

brianmay commented 1 year ago

What is LTO? I suspect the fact I am asking means I haven't done so.

My not so minimal program is https://github.com/brianmay/robotica-remote-rust/ - needs the default feature set to be altered to ["native", "lca2021_badge"]

Will work on a minimal program as I get time.

esp-idf-hal version 0.39.3 I think, target is xtensa-esp32-espidf.

https://github.com/brianmay/robotica-remote-rust/blob/main/Cargo.lock#L856-L872

ivmarkov commented 1 year ago

Yeah, if you can trim it down that would be appreciated.

Unrelated: this is likely wrong.

What you want instead is:

    let value = 1_u32 << (esp_idf_sys::gpio_get_level(pin_number) as u32);

(and then inspecting each bit which corresponds to a pin separately, in the notification callback)

brianmay commented 1 year ago

Yeah, if you can trim it down that would be appreciated.

Done. https://github.com/brianmay/rust-esp32-std-demo/tree/gpio_interrupts_2

Unfortunately, I cannot get this to crash. It looks like the threads are running sequentially, not in parallel here. So not sure how much use it is. It is pretty much the same code path though, without the function calls.

You lost with your your gpio_get_level code. Perhaps you meant:

let value = pin_number << (esp_idf_sys::gpio_get_level(pin_number) as u32);

(pin_number needs fixing though as it is the wrong type)

ivmarkov commented 1 year ago

Yeah, if you can trim it down that would be appreciated.

Done. https://github.com/brianmay/rust-esp32-std-demo/tree/gpio_interrupts_2

Unfortunately, I cannot get this to crash. It looks like the threads are running sequentially, not in parallel here. So not sure how much use it is. It is pretty much the same code path though, without the function calls.

I'll look into it these days. Threads are always working sequentially on ESP IDF (no time slicing). One thread can preempt another though, if it has a higher priority.

You lost with your your gpio_get_level code. Perhaps you meant:

let value = pin_number << (esp_idf_sys::gpio_get_level(pin_number) as u32);

(pin_number needs fixing though as it is the wrong type)

I lost? I'm trying to tell you that you are probably misunderstanding the FreeRtos API you are using. Read on the FreeRtos Task Notification API. Our Rust wrappers implement specifically the u32 bitset API.

ivmarkov commented 1 year ago

https://www.freertos.org/RTOS_Task_Notification_As_Event_Group.html

brianmay commented 1 year ago

As far as I am aware, I am not using FreeRtos. This is a call to EspNotify::notify, which is AFAIK not connected to FreeRtos in anyway, but rather IDF. At least that is my understanding :-)

ivmarkov commented 1 year ago

I hope this is just a strange sense of humor and otherwise you do realize that I wrote EspNotify? :)

Dominaezzz commented 1 year ago

I just so happened to be scrolling through GitHub and found this issue.

I'm 99% sure that this line here is the issue.

let _ = ISR_SERVICE_ENABLED_CS.enter();

When you specify a variable name of _ the Rust compiler almost immediately drops it, so the service install isn't actually happening in the critical section.

Doing this should fix it.

let _lock = ISR_SERVICE_ENABLED_CS.enter();

This is another reason to just let the user create the isr service themselves in main() instead of keeping a static variable/state and a lock deep inside this library.

ivmarkov commented 1 year ago

While I absolutely applaud your eagle eye, and do hope that's indeed the issue, I am not sure how you are arriving at the conclusion that complex code should be left in the hands of the user. :)

ivmarkov commented 1 year ago

What you are basically saying is something different: that the Guard pattern - especially when you are not actually using the guard - as is the case with CriticalSection - has some "gotcha"s.

So maybe I was wrong with using the Guard pattern in the first place.

Dominaezzz commented 1 year ago

It wouldn't be complex code, it would just be lock-free. See my previous proposal.

I'm not sure about whether the guard pattern is necessarily wrong, it's probably just not well implemented but I haven't thought about it too much.

I ran into this issue (let _ =) with the SPI driver. In that case I was lucky to have esp-idf quickly tell me that I had failed to acquire a lock on the bus.

Dominaezzz commented 1 year ago

Actually, thinking about Rust's std::sync::MutexGuard, this would not have happened since you would need to keep a reference to the guard to get at the protected data. Perhaps that's what this code needs (I have something like this in my Async SPI PR), though I'm not sure how it would look here.

ivmarkov commented 1 year ago

Actually, thinking about Rust's std::sync::MutexGuard, this would not have happened since you would need to keep a reference to the guard to get at the protected data. Perhaps that's what this code needs (I have something like this in my Async SPI PR), though I'm not sure how it would look here.

Sure. That's what I was saying as well, I guess:

What you are basically saying is something different: that the Guard pattern - especially when you are not actually using the guard - as is the case with CriticalSection - has some "gotcha"s.

Now, if you have suggestions how to fix the Guard pattern - I'm all ears! (Aside from replacing it with the lock closure, that is, which might be our only chance of doing it bullet-proof, actually.)

ivmarkov commented 1 year ago

@brianmay I've pushed a fix replacing let _ = <CS>.lock() with let _guard = <CS>.lock(). You might want to try esp-idf-hal master branch on your actual code to see whether this is fixing the issue.

Dominaezzz commented 1 year ago

I do have a suggestion.

struct Service;

impl Service {
    pub fn new(intr_alloc_flags: i32) -> Result<Service, EspError> {
        let result = esp!(unsafe { gpio_install_isr_service(intr_alloc_flags) });
        result.map(|_| Service)
    }
}

impl Drop for Service {
    fn drop(&mut self) {
        unsafe { gpio_uninstall_isr_service() };
    }
}

let CS = CriticalSection::<Option<Service>>::new();

// get atomic bool
let service = CS.enter();
*service = Some(Service::new(0)?);
// set atomic bool
ivmarkov commented 1 year ago

It wouldn't be complex code, it would just be lock-free. See my previous proposal.

I remember it. While the spirit of the HAL - to some extent - is with your proposal (e.g. you can instantiate a driver only if you exclusively own the peripheral, or have a &mut ref to it, which is the same), pushing this idea to the extreme means the user has to do even more book-keeping - as in keeping the "Service" representing the ownership/instantiation of the GPIO ISR around.

We might return to this idea at some point. Back in time I did not embrace it because we wanted to (a) have a first version of the subscription code released, and more importantly because (b) you did not convince us that having such a Service will have additional benefits besides avoiding the danger that the maintainers might misuse the critical section Guard pattern.

As in would I want to initialize the gpio ISR service in a different way? Unlikely. Would I want to de-initialize the gpio ISR service? Also unlikely (and - it can still be done, even in the absence of an explicit "Service" btw).

ivmarkov commented 1 year ago

I do have a suggestion.

struct Service;

impl Service {
    pub fn new(intr_alloc_flags: i32) -> Result<Service, EspError> {
        let result = esp!(unsafe { gpio_install_isr_service(intr_alloc_flags) });
        result.map(|_| Service)
    }
}

impl Drop for Service {
    fn drop(&mut self) {
        unsafe { gpio_uninstall_isr_service() };
    }
}

let CS = CriticalSection::<Option<Service>>::new();

// get atomic bool
let service = CS.enter();
*service = Some(Service::new(0)?);
// set atomic bool

Did you just invent a mutex on top of a critical section?

ivmarkov commented 1 year ago

Maybe I need to elaborate. A mutex and a critical section are roughly the same thing. In Rust world, the only way I differentiate between those is that a critical section does not "protect" the data. This is in the hands of the user. A mutex is protecting the data though as it lives "inside" the mutex.

In a world like C there is no difference between the two.

(And then of course there are OS-specific differences, like in windows a critical section is within-process only, while a mutex is across-process, so the whole boundary between a mutex and a critical section is very blurred. Hence me mentioning languages instead.)

ivmarkov commented 1 year ago

Oh well. Maybe I should name our CriticalSection RawMutex as the embassy-sync crate did. And switch to the lock closure pattern.

Dominaezzz commented 1 year ago

you did not convince us that having such a Service will have additional benefits besides avoiding the danger that the maintainers might misuse the critical section Guard pattern.

Yeah sorry about that, I caught COVID at the time and the PR was merged before I recovered.

Also unlikely (and - it can still be done, even in the absence of an explicit "Service" btw).

It can? I don't see how it can be done safely without a type.

Maybe I need to elaborate....

Ah I see. Well, since you're not protecting data in this case then a closure will be necessary for bulletproof-ness.

Though one could argue that we're protecting the data being initialised inside gpio_install_isr_service :shrug: . What I'm saying is maybe a mutex is better construct for this code? Is it really sound to use a "critical section" in Rust? Feels like circumventing the borrow checker.

ivmarkov commented 1 year ago

As safe - I guess - as trying to lock a mutex twice. UB would follow, but not a use-after-free UB that Rust has strong guarantees for.

brianmay commented 1 year ago

I am somewhat kicking myself, the number of times I looked at the enable_isr_service() function and I didn't see this issue.

Have mixed feelings here. If I didn't know the rules, I would really not expect let _ = to be different from let _xxx =.

I cannot test against the master branch, it appears the wifi code is broken - peripherals.modem is missing a required trait.

So I kludged a fix instead, and so far it appears to be working.

ivmarkov commented 1 year ago

@Dominaezzz I'm closing this. Feel free to open a separate issue w.r.t. switching critical sections to the lock closure pattern, and possibly another one for the "Service" gpio pattern, if you still feel strongly about it.