embassy-rs / nrf-softdevice

Apache License 2.0
254 stars 74 forks source link

`InvalidState` after re-connecting #256

Closed HaoboGu closed 1 month ago

HaoboGu commented 1 month ago

The original issue is https://github.com/embassy-rs/nrf-softdevice/issues/237, since it's a different problem, I'm opening this new issue.

I just found I didn't really solve the reconnection problem. I called load_sys_attrs after connected:

match peripheral::advertise_pairable(sd, adv, &config, bonder).await {
    Ok(conn) => {
        info!("Connected to BLE");
        bonder.load_sys_attrs(&conn);
        let ble_fut = gatt_server::run(&conn, &ble_server, server_callback);
    }
    Err(e) => {
        error!("Advertise error: {}", e)
    }
}

where my load_sys_attrs is:

fn load_sys_attrs(&self, conn: &Connection) {
    let addr = conn.peer_address();
    info!("Loading system attributes for {}", addr);

    let bond_info = self.bond_info.borrow();

    let sys_attr = bond_info
        .iter()
        .filter(|(_, b)| b.sys_attr.length != 0 && b.removed == false)
        .find(|(_, b)| b.peer.peer_id.is_match(addr))
        .map(|(_, b)| &b.sys_attr.data[0..b.sys_attr.length]);

    info!("call set_sys_attrs in load_sys_attrs: {}, {}", sys_attr, sys_attr.unwrap().len());
    if let Err(err) = set_sys_attrs(conn, sys_attr) {
        warn!("SecurityHandler failed to set sys attrs: {:?}", err);
    }
}

The first connection and reconnection are both good, but once I tried to send HID report, the gatt_server::notify_value(self.conn, self.handle, report) returns a Raw(InvalidState) error. Here is the full log of the first and second connection:

first(all good):

INFO  start!
└─ rmk_nrf52840::____embassy_main_task::{async_fn#0} @ src\main.rs:11
INFO  Enabling ext hfosc...
└─ rmk_nrf52840::____embassy_main_task::{async_fn#0} @ src\main.rs:11
INFO  softdevice RAM: 29456 bytes
└─ nrf_softdevice::softdevice::{impl#0}::enable @ C:\Users\haobo\.cargo\git\checkouts\nrf-softdevice-03ef4aef10e777e4\cdee83c\nrf-softdevice\src\fmt.rs:138
WARN  You're giving more RAM to the softdevice than needed. You can change your app's RAM start address to 20007310
└─ nrf_softdevice::softdevice::{impl#0}::enable @ C:\Users\haobo\.cargo\git\checkouts\nrf-softdevice-03ef4aef10e777e4\cdee83c\nrf-softdevice\src\fmt.rs:151
INFO  Loaded saved bond info: 1
└─ rmk::ble::nrf::initialize_nrf_ble_keyboard_with_config_and_run::{async_fn#0} @ E:\rmk\rmk\src\ble\nrf.rs:199
INFO  USB suspended, BLE Advertising
└─ rmk::ble::nrf::initialize_nrf_ble_keyboard_with_config_and_run::{async_fn#0} @ E:\rmk\rmk\src\ble\nrf.rs:243
INFO  Connected to BLE
└─ rmk::ble::nrf::initialize_nrf_ble_keyboard_with_config_and_run::{async_fn#0} @ E:\rmk\rmk\src\ble\nrf.rs:249
INFO  Loading system attributes for Public:[ab, e2, d4, 7d, b2, 7c]
└─ rmk::ble::nrf::bonder::{impl#4}::load_sys_attrs @ E:\rmk\rmk\src\ble\nrf\bonder.rs:210
INFO  call set_sys_attrs in load_sys_attrs: Some([13, 0, 2, 0, 0, 0, 26, 0, 2, 0, 0, 0, 38, 0, 2, 0, 0, 0, 45, 0, 2, 0, 0, 0, 49, 0, 2, 0, 0, 0, 53, 0, 2, 0, 0, 0, 57, 0, 2, 0, 0, 0, 78, 188]), 44
└─ rmk::ble::nrf::bonder::{impl#4}::load_sys_attrs @ E:\rmk\rmk\src\ble\nrf\bonder.rs:220
INFO  Starting GATT server 200 ms later
└─ rmk::ble::nrf::run_ble_keyboard::{async_fn#0} @ E:\rmk\rmk\src\ble\nrf.rs:333
INFO  on_security_update, new security mode: JustWorks
└─ rmk::ble::nrf::bonder::{impl#4}::on_security_update @ E:\rmk\rmk\src\ble\nrf\bonder.rs:94
DEBUG On bonded: storing bond for MasterId { ediv: 25868, rand: [130, 88, 153, 22, 65, 23, 20, 21] }
└─ rmk::ble::nrf::bonder::{impl#4}::on_bonded @ E:\rmk\rmk\src\ble\nrf\bonder.rs:105
INFO  ON BOND: get sys attr: [13, 0, 2, 0, 0, 0, 26, 0, 2, 0, 0, 0, 38, 0, 2, 0, 0, 0, 45, 0, 2, 0, 0, 0, 49, 0, 2, 0, 0, 0, 53, 0, 2, 0, 0, 0, 57, 0, 2, 0, 0, 0, 78, 188, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], size:44
└─ rmk::ble::nrf::bonder::{impl#4}::on_bonded @ E:\rmk\rmk\src\ble\nrf\bonder.rs:130
DEBUG Sent bond info to flash channel
└─ rmk::ble::nrf::bonder::{impl#4}::on_bonded @ E:\rmk\rmk\src\ble\nrf\bonder.rs:152
INFO  Saving bond info: BondInfo(BondInfo { slot_num: 0, peer: Peer { master_id: MasterId { ediv: 25868, rand: [130, 88, 153, 22, 65, 23, 20, 21] }, key: EncryptionInfo { ltk: [2, 53, 199, 84, 1, 229, 28, 117, 196, 197, 66, 170, 76, 165, 44, 252], flags: 64 }, peer_id: IdentityKey { irk: IdentityResolutionKey { irk: [13, 235, 235, 74, 4, 237, 34, 11, 38, 15, 13, 77, 96, 179, 252, 154] }, addr: Public:[ab, e2, d4, 7d, b2, 7c] } }, sys_attr: SystemAttribute { length: 44, data: [13, 0, 2, 0, 0, 0, 26, 0, 2, 0, 0, 0, 38, 0, 2, 0, 0, 0, 45, 0, 2, 0, 0, 0, 49, 0, 2, 0, 0, 0, 53, 0, 2, 0, 0, 0, 57, 0, 2, 0, 0, 0, 78, 188, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] }, removed: false })
└─ rmk::storage::{impl#2}::run::{async_fn#0} @ E:\rmk\rmk\src\storage.rs:420
INFO  HID input keyboard cccd: [1, 0]
└─ rmk::ble::nrf::hid_service::{impl#0}::on_write @ E:\rmk\rmk\src\ble\nrf\hid_service.rs:201
INFO  HID input media keys: [1, 0]
└─ rmk::ble::nrf::hid_service::{impl#0}::on_write @ E:\rmk\rmk\src\ble\nrf\hid_service.rs:211
INFO  HID input via keys: [1, 0]
└─ rmk::ble::nrf::hid_service::{impl#0}::on_write @ E:\rmk\rmk\src\ble\nrf\hid_service.rs:209
INFO  HID output keyboard: [0]
└─ rmk::ble::nrf::hid_service::{impl#0}::on_write @ E:\rmk\rmk\src\ble\nrf\hid_service.rs:207
DEBUG Sending keyboard report: [0x4, 0x0, 0x0, 0x0, 0x0, 0x0]
└─ rmk::keyboard::{impl#0}::send_keyboard_report::{async_fn#0} @ E:\rmk\rmk\src\keyboard.rs:135
DEBUG Sending keyboard report: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0]
└─ rmk::keyboard::{impl#0}::send_keyboard_report::{async_fn#0} @ E:\rmk\rmk\src\keyboard.rs:135

second(reconnection, notify value error):

INFO  start!
└─ rmk_nrf52840::____embassy_main_task::{async_fn#0} @ src\main.rs:11
INFO  Enabling ext hfosc...
└─ rmk_nrf52840::____embassy_main_task::{async_fn#0} @ src\main.rs:11
INFO  softdevice RAM: 29456 bytes
└─ nrf_softdevice::softdevice::{impl#0}::enable @ C:\Users\haobo\.cargo\git\checkouts\nrf-softdevice-03ef4aef10e777e4\cdee83c\nrf-softdevice\src\fmt.rs:138
WARN  You're giving more RAM to the softdevice than needed. You can change your app's RAM start address to 20007310
└─ nrf_softdevice::softdevice::{impl#0}::enable @ C:\Users\haobo\.cargo\git\checkouts\nrf-softdevice-03ef4aef10e777e4\cdee83c\nrf-softdevice\src\fmt.rs:151
INFO  Loaded saved bond info: 1
└─ rmk::ble::nrf::initialize_nrf_ble_keyboard_with_config_and_run::{async_fn#0} @ E:\rmk\rmk\src\ble\nrf.rs:199
INFO  USB suspended, BLE Advertising
└─ rmk::ble::nrf::initialize_nrf_ble_keyboard_with_config_and_run::{async_fn#0} @ E:\rmk\rmk\src\ble\nrf.rs:243
INFO  Connected to BLE
└─ rmk::ble::nrf::initialize_nrf_ble_keyboard_with_config_and_run::{async_fn#0} @ E:\rmk\rmk\src\ble\nrf.rs:249
INFO  Loading system attributes for Public:[ab, e2, d4, 7d, b2, 7c]
└─ rmk::ble::nrf::bonder::{impl#4}::load_sys_attrs @ E:\rmk\rmk\src\ble\nrf\bonder.rs:210
INFO  call set_sys_attrs in load_sys_attrs: Some([13, 0, 2, 0, 0, 0, 26, 0, 2, 0, 0, 0, 38, 0, 2, 0, 0, 0, 45, 0, 2, 0, 0, 0, 49, 0, 2, 0, 0, 0, 53, 0, 2, 0, 0, 0, 57, 0, 2, 0, 0, 0, 78, 188]), 44
└─ rmk::ble::nrf::bonder::{impl#4}::load_sys_attrs @ E:\rmk\rmk\src\ble\nrf\bonder.rs:220
INFO  Starting GATT server 200 ms later
└─ rmk::ble::nrf::run_ble_keyboard::{async_fn#0} @ E:\rmk\rmk\src\ble\nrf.rs:333
DEBUG Getting bond for MasterId { ediv: 25868, rand: [130, 88, 153, 22, 65, 23, 20, 21] }
└─ rmk::ble::nrf::bonder::{impl#4}::get_key @ E:\rmk\rmk\src\ble\nrf\bonder.rs:165
INFO  on_security_update, new security mode: JustWorks
└─ rmk::ble::nrf::bonder::{impl#4}::on_security_update @ E:\rmk\rmk\src\ble\nrf\bonder.rs:94
DEBUG Sending keyboard report: [0x4, 0x0, 0x0, 0x0, 0x0, 0x0]
└─ rmk::keyboard::{impl#0}::send_keyboard_report::{async_fn#0} @ E:\rmk\rmk\src\keyboard.rs:135
ERROR Send ble report error: Raw(InvalidState)
└─ rmk::ble::nrf::server::{impl#1}::write::{async_fn#0}::{closure#0} @ E:\rmk\rmk\src\ble\nrf\server.rs:44

And same as before, if I remove bonder.load_sys_attrs(&conn);, then I get Raw(BleGattsSysAttrMissing) error.

The sys_attr saved and loaded was right according to the log, but the InvalidState still happens when calling gatt_server::notify_value()

@alexmoon Do you have any ideas? I'll much appreciate if you can help!

Originally posted by @HaoboGu in https://github.com/embassy-rs/nrf-softdevice/issues/237#issuecomment-2131286708

HaoboGu commented 1 month ago

Just did a little more investigation. This issue only appears when the host is windows.

I also notice that for macOS and iOS, after the first connection and reconnection, the on_write of HidService is called, because I saw the logs like

INFO  HID input keyboard cccd: [1, 0]
└─ rmk::ble::nrf::hid_service::{impl#0}::on_write @ E:\rmk\rmk\src\ble\nrf\hid_service.rs:201
INFO  HID input media keys: [1, 0]
└─ rmk::ble::nrf::hid_service::{impl#0}::on_write @ E:\rmk\rmk\src\ble\nrf\hid_service.rs:211
INFO  HID input via keys: [1, 0]
└─ rmk::ble::nrf::hid_service::{impl#0}::on_write @ E:\rmk\rmk\src\ble\nrf\hid_service.rs:209

Then everything goes well.

When connecting to a windows host, the logs above only appear at the first connection. It seems calling on_write indicates that everything is good. But I still don't know why connecting to windows doesn't call on_write

alexmoon commented 1 month ago

The NRF softdevice documentation is the best source for detailed information. They have message sequence charts and descriptions of each possible error return from each API function. In the case of sd_ble_gatts_hvx an InvalidState return indicates:

One or more of the following is true:

  • Invalid Connection State
  • Notifications and/or indications not enabled in the CCCD
  • An ATT_MTU exchange is ongoing

Most likely the second option. One of the main things that's saved in the sys_attrs is the value of all CCCD descriptors. The GATT spec says that those values persist across connections for bonded pairs, which is why you're host only writes to the CCCD descriptor on the first connection. If your InvalidState error is due to notifications not being enabled in the CCCD then you're not saving or loading the sys attrs correctly. In your SecurityHandler::save_sys_attrs callback you need to call gatt_server::get_sys_attrs on the connection and save the blob for that peer. Then in SecurityHandler::load_sys_attrs you need to call gatt_server::set_sys_attrs with the exact blob that was previously saved for that peer.

HaoboGu commented 1 month ago

The NRF softdevice documentation is the best source for detailed information. They have message sequence charts and descriptions of each possible error return from each API function. In the case of sd_ble_gatts_hvx an InvalidState return indicates:

One or more of the following is true:

  • Invalid Connection State

  • Notifications and/or indications not enabled in the CCCD

  • An ATT_MTU exchange is ongoing

Most likely the second option. One of the main things that's saved in the sys_attrs is the value of all CCCD descriptors. The GATT spec says that those values persist across connections for bonded pairs, which is why you're host only writes to the CCCD descriptor on the first connection. If your InvalidState error is due to notifications not being enabled in the CCCD then you're not saving or loading the sys attrs correctly. In your SecurityHandler::save_sys_attrs callback you need to call gatt_server::get_sys_attrs on the connection and save the blob for that peer. Then in SecurityHandler::load_sys_attrs you need to call gatt_server::set_sys_attrs with the exact blob that was previously saved for that peer.

I believe that I did this correctly. As you can see in the log, I saved the sys_attr to flash when bonding, and at the second connection, same sys_attr was loaded and set. But I still got InvalidState when the host is Windows (it works correctly for macos/iOS).

save_sys_attr was not called because I was testing the power loss situation. I'm testing cases of save_sys_attr is called.

HaoboGu commented 1 month ago

Yeah, when save_sys_attr is called, the correct sys_attr is saved and everything goes well. But in the power loss situation, it's not correctly saved. I added an extra save_sys_attr call when bonding, but the value of not correct. How do i solve this?

alexmoon commented 1 month ago

You can call get_sys_attrs whenever you want and update your saved data. To be power safe you'd need to do that whenever the sys_attrs change. Unfortunately, Nordic doesn't document exactly what's included in the sys_attrs, and some of the changes may not generate events that are observable by the application. At the very least you'd want to do it whenever a CCCD value is written. Bluetooth as a spec was designed for battery powered devices so I don't think power loss safety was a prime concern.

HaoboGu commented 1 month ago

It is good to use get_sys_attrs whenever I set sys_attrs? For example, in load_sys_attrs, using get_sys_attrs to replace load saved sys_attrs, before calling set_sys_attrs,

alexmoon commented 1 month ago

No, that won't work. get_sys_attrs gets the current value of the system attributes for the connection. Calling set_sys_attrs with the result from get_sys_attrs would be a no-op. Please review the the Nordic documentation for the sys attrs and the message sequences around setting and getting them.

At this point I think everything in nrf_softdevice is working as intended, so I'm going to close the issue. If you want more help getting your application to work, Matrix chat is your best bet; though be aware that everyone is volunteering their time there and I don't think many people have done much work with dealing with bonded connections.

HaoboGu commented 1 month ago

thanks!