Closed igiona closed 6 months ago
As soon as the BLE client writes the u64_value characteristic, the whole MCU stalls and nothing runs anymore.
What are you using for logging? If you're using something other than defmt-rttt + a debug probe, it might be a panic that you don't get to see.
As soon as the BLE client writes the u64_value characteristic, the whole MCU stalls and nothing runs anymore.
What are you using for logging? If you're using something other than defmt-rttt + a debug probe, it might be a panic that you don't get to see.
I'm using defmt-rtt = "0.4.0"
and probe-rs
with a non-JLINK probe (=> AK-CMSIS-DAP JTAG/SWD).
If I let the code panic!
I can see the panic message from the probe, in this case though there is no message pushed out and actually even some of the older log messages that should be there are not pushed out. I just added a couple of info!("test");
before the access of the incriminated variable, and non of them is arriving on the terminal.
I guess that these string are still in some cache of the RTT module...
Then it's not a panic indeed. Strange.
Can you run probe-rs with --always-print-stacktrace, then kill it with control+c when it hangs, and paste the stacktrace? ideally without --release so the stacktrace is better, hopefully it still reprodues in that case.
If the stacktrace points to WFE in the executor it's an async hang, I guess you'd need to add prints inside the nrf-softdevice code to figure out where the hang is...
Then it's not a panic indeed. Strange.
Can you run probe-rs with --always-print-stacktrace, then kill it with control+c when it hangs, and paste the stacktrace? ideally without --release so the stacktrace is better, hopefully it still reprodues in that case.
If the stacktrace points to WFE in the executor it's an async hang, I guess you'd need to add prints inside the nrf-softdevice code to figure out where the hang is...
Thanks for your suggestion, I'll give it a try and come back with the stack trace. Unfortunately, the issue happens only in release mode :'(
The implementation of FixedGattValue::from_gatt
for Primitive
looks sus to me. If data
is unaligned, I think you could have UB. It should probably be using core::ptr::read_unaligned
instead.
I added --always-print-stacktrace
as suggested.
The stack trace shows that @alexmoon you are on the right spot!
Frame 0: __cortex_m_rt_HardFault @ 0x0002b692 inline
src/main.rs:60:5
Frame 1: __cortex_m_rt_HardFault_trampoline @ 0x000000000002b692
src/main.rs:58:1
Frame 2: "HardFault handler. Cause: Escalated UsageFault (Unaligned access)." @ 0x0002b692
Frame 3: from_gatt<u64> @ 0x00025b44 inline
.cargo\registry\src\index.crates.io-6f17d22bba15001f\nrf-softdevice-0.1.0\src\ble/gatt_traits.rs:69:18
Frame 4: from_gatt<u64> @ 0x0000000000025b40
.cargo\registry\src\index.crates.io-6f17d22bba15001f\nrf-softdevice-0.1.0\src\ble/gatt_traits.rs:42:9
Frame 5: on_write @ 0x0001ce5a inline
So I compared the --release
build with a normal build with the following code in place:
fn from_gatt(data: &[u8]) -> Self {
if data.len() != Self::SIZE {
panic!("Bad len")
}
let data_raw : *const u8 = data.as_ptr();
if ((data_raw as usize) % 4) != 0 {
panic!("NOT ALIGNED");
}
unsafe { *(data.as_ptr() as *const Self) }
}
To my somewhat surprise, in both cases the code panic :/ Maybe my check doesn't do what I assumes it does after all!?
Never the less, following @alexmoon 's suggestion, I modified the to_gatt
implementation as follow:
fn from_gatt(data: &[u8]) -> Self {
if data.len() != Self::SIZE {
panic!("Bad len")
}
let ptr = data.as_ptr() as *const Self;
unsafe { ptr.read_unaligned() }
}
aaaand the code didn't stuck anymore 🥳 !
Many thanks for you support 👍
I have a pretty basic application (peripheral only) which has a GATT server containing a BAS, DeviceInfo and a custom service. The custom service is defined as follows:
My GATT server runs in a dedicated task as follow:
In debug-mode, the code performs as expected. When I build and run my code in release mode though, everything runs as expected until the point in which I access the GATT value in the OnWrite event:
info!("Got new u64 value {}", val);
As soon as the BLE client writes the u64_value characteristic, the whole MCU stalls and nothing runs anymore. A reset is required to recover from that state. If I do not access theval
at all (log message commented out) in the event (just a debugging step, not really a valuable workaround), the MCU doesn't stuck anymore.My build-release configuration is as follow:
I have been able to work around this issue with the usage of a
heapless::Vec<u8, 8>
instead of au64
, and a manual (de-)serialization of the byte array into/from a u64 value.Can someone explain this behaviour or has ideas on how further debug this?