bluekitchen / btstack

Dual-mode Bluetooth stack, with small memory footprint.
http://bluekitchen-gmbh.com
Other
1.74k stars 618 forks source link

Crash after hids_client_disconnect #617

Closed AndyDevLat closed 4 months ago

AndyDevLat commented 5 months ago

I have tried to call hids_client_disconnect to free resources when connection is dropped, but it causes memory corruption as it seems hids_client also frees the memory internally. Or should we never call this API at all?

According to backtrace it crashes inside btstack_memory_hids_client_free inside hids_finalize_client, which is called by handle_gatt_client_event when some event arrives internally later.

mringwal commented 5 months ago

Upon which event did you call hids_client_disconnect?

In general, you don't need to call any 'disconnect' function to free resources as this happens automatically. You can call 'disconnect' functions to actively disconnect something. For HIDS Client, it would stop the client while the HCI Connection would persist.

In practive, there are not many use cases where you need to call a 'disconnect' function. You mostly call 'gap_disconnect', if you want to disconnect from a remote device, which will close all connections to it.

AndyDevLat commented 5 months ago

Thank you for information! I tried to call it after getting HCI_EVENT_DISCONNECTION_COMPLETE in my packet_handler. But looks like it was bad idea. Btw, it is on ESP32-S3 and the code is based on hog_host_demo.c example. I just tried to "enhance" it. The other place i tried to insert it was inside GATTSERVICE_SUBEVENT_HID_SERVICE_CONNECTED event in handle_gatt_client_event when HID service connection failed: i had scenarios when gattservice_subevent_hid_service_connected_get_status returns error 0x0f or 0x1f (this usually happens when peripheral device was "re-paired" to other host but we try to "acquire" it again. This error persists until we put into pairing mode again and is easy to reproduce. In such scenario i though it was reasonable to call hids_client_disconnect in order to unlink hids_client from current connection .

AndyDevLat commented 5 months ago

I am not using hids_client_disconnect anymore, but then another issue occured: after peripheral was disconnected i got HCI_EVENT_DISCONNECTION_COMPLETE event. But a bit later I turned it on and after pairing was complete hids_client_connect returned ERROR_CODE_COMMAND_DISALLOWED. Which means it did not recognize that previous connection was dropped ... So definitely something is missing here. The peripheral i am using for test is Google Stadia controller.

mringwal commented 5 months ago

Please provide HCI logs for any issue report including this here. It allows to check the Bluetooth state and makes it easier to analyze and reconstruct the issue.

AndyDevLat commented 5 months ago

Ok, i may do it later but need to cleanup my code first and remove some stuff to make HCI logs readable. But meanwhile, could you please explain how hids_client is supposed to cleanup resources when connection is dropped? I checked its code, but hids_finalize_client seems to be called only during response to some GATT error events immediately after hids_emit_connection_established (which in turn sends GATTSERVICE_SUBEVENT_HID_SERVICE_CONNECTED event to my application). On the other hand, if connection is dropped, i do not see GATTSERVICE_SUBEVENT_HID_SERVICE_CONNECTED events too.

You mostly call 'gap_disconnect', if you want to disconnect from a remote device, which will close all connections to it.

I understand this concept, but where does hids_client check that connection was dropped and calls hids_finalize_client?

AndyDevLat commented 5 months ago

There is also one issue in hids_client: it sends gatt_client_write_value_of_characteristic but ignores att_status. But it may happen to return GATT_CLIENT_BUSY or some other error and it will stuck in HIDS_CLIENT_W4_WRITE_REPORT_DONE forever ...

mringwal commented 4 months ago

Thanks for the additional comments.

hids_client uses a pool of hids_client_t structures. It's the responsibility of the hids_client implementation to alloc/free these as necessary.

We will need to review the life cycle of the connection struct and get back to you here.

mringwal commented 4 months ago

hids_client indeed did not handle hci disconnect. We have added the missing code and also added a new GATTSERVICE_SUBEVENT_HID_SERVICE_DISCONNECTED for the application.

--

Could you retest and verify if connect/disconnect works as expected when using the code on the develop branch?

--

Yes, gatt_client_write_value_of_characteristic in state HIDS_CLIENT_W2_SEND_WRITE_REPORT does not check that status. However, it should only enter this state if it was in state HIDS_CLIENT_STATE_CONNECTED in hids_client_send_write_report. If you can reproduce an error here, could you either share logs or let me know which status you get?

AndyDevLat commented 4 months ago

Thank you for looking into this problem. To address these issues i have just created my own version of hids_client.c and patched it. Here is what i did:

1) Removed all references to hids_finalize_client in original code and call it only from hids_client_disconnect. In other words i am in control of the life cycle of hids_client_t structure. I call hids_client_disconnect from my own HCI_EVENT_DISCONNECTION_COMPLETE packet handler so actually GATTSERVICE_SUBEVENT_HID_SERVICE_DISCONNECTED is of no big use for me. I also call it after any other error, eg. when GATTSERVICE_SUBEVENT_HID_SERVICE_CONNECTED reported some error. With such approach i am in full control of hids_client_t structure and do not depend on hids_client internal events.

2) I have modified hids_client_send_write_report and hids_client_send_get_report functions and instead of calling hids_run_for_client i do call gatt_client_write_value_of_characteristic or gatt_client_read_value_of_characteristic_using_value_handle directly and check the status. Inside hids_client_send_write_report i have added such code: client->state = HIDS_CLIENT_W4_WRITE_REPORT_DONE; // see GATT_EVENT_QUERY_COMPLETE for end of write uint8_t att_status = gatt_client_write_value_of_characteristic(&handle_gatt_client_event, client->con_handle, client->reports[client->report_index].value_handle, client->report_len, (uint8_t *)client->report); if (att_status == ERROR_CODE_SUCCESS) { return ERROR_CODE_SUCCESS; } else { client->state = HIDS_CLIENT_STATE_CONNECTED;

    if(att_status == GATT_CLIENT_BUSY ||
       att_status == GATT_CLIENT_IN_WRONG_STATE)
    {
        return ERROR_CODE_COMMAND_DISALLOWED;
    }
    else
    {
        return att_status;
    }
}

Similar code is in hids_client_send_get_report:

client->state = HIDS_CLIENT_W4_GET_REPORT_RESULT;

// result in GATT_EVENT_CHARACTERISTIC_VALUE_QUERY_RESULT
uint8_t att_status = gatt_client_read_value_of_characteristic_using_value_handle(&handle_report_event,
                                                                                 client->con_handle,
                                                                                 client->reports[client->report_index].value_handle);
if(att_status == ERROR_CODE_SUCCESS)
{
    return ERROR_CODE_SUCCESS;
}
else
{
    client->state = HIDS_CLIENT_STATE_CONNECTED;

    if (att_status == GATT_CLIENT_BUSY ||
       att_status == GATT_CLIENT_IN_WRONG_STATE)
    {
        return ERROR_CODE_COMMAND_DISALLOWED;
    }
    else
    {
        return att_status;
    }
}

It now works stable. As you can see these functions sometimes return GATT_CLIENT_BUSY or GATT_CLIENT_IN_WRONG_STATE errors. So i treat these status codes as ERROR_CODE_COMMAND_DISALLOWED and my application treats this error code specially as busy status and retries request later.

mringwal commented 4 months ago

Glad you got it working. The fix on develop is alignment with memory management in all other layers, e.g. L2CAP and RFCOMM.

As for the error in the gatt client queries: our code should use gatt_client_request_to_send_gatt_query function to get notified when it's safe to do a gatt query. Until then, you could call gatt_client_is_ready() to check if a gatt query could be sent right now, and only in case of success call hids_client_send_write_report().

As you've got a solution for now and we've identified the necessary steps, I'll close this issue for now. Feel free to open a new one if you run into other issues.