embassy-rs / nrf-softdevice

Apache License 2.0
269 stars 80 forks source link

Example ble_bas_peripheral_notify panics during init #159

Closed nproux closed 1 year ago

nproux commented 1 year ago

The ble_bas_peripheral_notify example is currently broken. After cargo command "cargo run --bin ble_bas_peripheral_notify" with an nRF52840 development kit, the following panic is reported in the terminal:

0.000000 ERROR panicked at 'Softdevice memory access violation. Your program accessed registers for a peripheral reserved to the softdevice. PC=395b8 PREGION=1' └─ nrf_softdevice::softdevice::fault_handler @ D:\rust\embassy-softdevice\nrf-softdevice\src\fmt.rs:101 0.000000 ERROR panicked at 'explicit panic', C:\Users\nathan.cargo\registry\src\github.com-1ecc6299db9ec823\defmt-0.3.2\src\lib.rs:367:5 └─ panic_probe::print_defmt::print @ C:\Users\nathan.cargo\registry\src\github.com-1ecc6299db9ec823\panic-probe-0.3.0\src\lib.rs:91

The root cause appears to be that when grabbing a handle to the peripherals through the HAL via "embassy_nrf::init(Default::default)", the init function directly modifies a register of a peripheral that is reserved by the softdevice while the softdevice is enabled. Peripherals in the chip have three fundamental access states that change based on whether the softdevice is enabled or not: Either Open, Restricted, or Blocked. Open means free access, Blocked means no access, and Restricted means that access must go through the appropriate softdevice API function. The table for access modes for devices with the s140 softdevice loaded can be found at the following link on the Nordic infocenter. Any peripherals not listed on this table are always open. Note that each version of the softdevice has its own table, and therefore may be different compared to other softdevice versions (s112, s132, ect.). https://infocenter.nordicsemi.com/topic/sds_s140/SDS/s1xx/sd_resource_reqs/hw_block_interrupt_vector.html

My current best guess is that this particular panic occurs when the example goes to write the LF clock source in the init function, which is access type restricted while the softdevice is enabled. But the problem is somewhat more general than that, because none of the HF clock or LF clock config states are protected from this form of error. The GPIOTE and RTC time driver init should be fine though, as those are Open peripherals regardless of softdevice state.

Reversing the order of initialization (HAL first, then softdevice) results in a different error, because apparently the default configuration of HF clock from the HAL is incompatible with the softdevice, so the example would still be broken even if the HAL init was using the correct access functions.

I wouldn't mind working on a PR to fix some of this, but there should be a discussion of what the ideal solution should look like first. From my perspective, ideally the HAL would detect that a softdevice is used by the application through a feature flag, and use the function sd_softdevice_is_enabled to conditionally use the softdevice access functions for HF clock and LF clock access instead of direct register access while the softdevice is enabled. Additionally the HAL init function should also check that the passed config parameters have HF and LF clock settings that are compatible with softdevice operation if the softdevice is enabled, because some clock modes are also restricted.

Finally, it would also be good to double check whether the embassy executor is correctly calling sd_app_evt_wait instead of __WFE while the softdevice is in the enabled state, because calling the chip's sleep function directly can also cause misbehavior while the softdevice is enabled. Given that the HAL isn't currently protecting the clock functions, I'd appreciate if someone who knows about the async executor side of things could quickly check on that.

Most of the softdevice peripheral access functions for when the softdevice is enabled are documented here. https://infocenter.nordicsemi.com/topic/com.nordic.infocenter.s140.api.v7.0.1/group___n_r_f___s_o_c___f_u_n_c_t_i_o_n_s.html

Dirbaio commented 1 year ago

The fix is initializing embassy-nrf first, softdevice second.

Reversing the order of initialization (HAL first, then softdevice) results in a different error, because apparently the default configuration of HF clock from the HAL is incompatible with the softdevice, so the example would still be broken even if the HAL init was using the correct access functions.

Is it SdmIncorrectInterruptConfiguration? if so the cause is not setting the interrupt priorities.

Finally, it would also be good to double check whether the embassy executor is correctly calling sd_app_evt_wait instead of __WFE while the softdevice is in the enabled state, because calling the chip's sleep function directly can also cause misbehavior while the softdevice is enabled.

It calls WFE. As fas as I know WFE is supported when using the softdevice. Do you have any link on the "can cause misbehavior"?

The only thing I know about is sd_app_evt_wait contains one errata workaround regarding increased power consumption in nrf52832. discussion, devzone thread, another devzone thread.

nrf-softdevice could provide a custom executor that calls sd_app_evt_wait instead.

IMO neither embassy-nrf nor embassy-executor should contain softdevice-specific code.

nproux commented 1 year ago

The fix is initializing embassy-nrf first, softdevice second.

Is it SdmIncorrectInterruptConfiguration? if so the cause is not setting the interrupt priorities.

Good call, builds and runs after adjusting the interrupt priority from P0 to P2 for the GPIOTE and Time drivers and moving the HAL init to before the softdevice init. Mind if I open a pull request to fix that in the example?

It calls WFE. As fas as I know WFE is supported when using the softdevice. Do you have any link on the "can cause misbehavior"?

The only thing I know about is sd_app_evt_wait contains one errata workaround regarding increased power consumption in nrf52832. discussion, devzone thread, another devzone thread.

nrf-softdevice could provide a custom executor that calls sd_app_evt_wait instead.

IMO neither embassy-nrf nor embassy-executor should contain softdevice-specific code.

I got bit by the nRF52832 bug so the "use the softdevice wait function" became an "always do this" rule in my head, and I never rechecked that later on. The increased power consumption was indeed the misbehavior I was thinking of (I treat excess power consumption errata as misbehavior), so I'm clearly wrong about it when it comes to any of the other devices. Using the function can still be a small power win regardless when advertising or getting empty connection events, because it goes back to sleep instead of handing control back to the application in those cases. Though with an async executor with poll semantics I suppose that point is basically moot anyway, as you're effectively doing that yourself. power discussion

Good point on keeping softdevice code out of embassy-nrf and embassy-executor. Based on above I would now agree with you that leaving sleep as WFE should be fine. I suppose the side effect of that means "you must be careful not to violate softdevice requirements while using the HAL", instead of "HAL sanity checks that your inputs are safe given softdevice usage". I suppose if I wanted that type of safety I'd have to write a wrapper around the HAL to accomplish that, which is fine.

After example is fixed with moved HAL init and values, this issue should be closed.