Open dr-gino opened 3 years ago
Hi, @dr-gino
Crash occurs when printf
is called in an interrupt. You can replace printf
with ets_printf
in report_recv_called_from_isr
.
Thanks.
Hi, @dr-gino
Crash occurs when
printf
is called in an interrupt. You can replaceprintf
withets_printf
inreport_recv_called_from_isr
.Thanks.
Hi @xiongweichao,
You were right, there was a call to printf
inside report_recv_called_from_isr
commenting it out or replacing it with ets_printf
prevents it from crashing but as you can probably tell from the name of the function report_recv_called_from_isr
the developer of BTstack did not expect the HCI packet callback to be called in that context.
Is BTstack's author incorrect in assuming that the HCI packet handler callback registered using esp_vhci_host_register_callback
is never to be called from within an ISR context?
Hi @xiongweichao
we've got more data on this. Please have a look at esp32-hw-error-on-acl-packet-without-connection.pklg.zip
In the log, the host stack sends an ACL packet after it receives the HCI Disconnect event. In this situation, the stack should not send the packet. We will fix it in the stack.
However, the same problem can also occur in other situations. Eg.. assume the ESP32 is streaming audio packets to a remote device. When the remote device sends a HCI Disconnect (without first closing AVDTP L2CAP connection), the host stack might queue an AVDTP media packet just before the HCI Disconnect is received. As HCI is asynchronous by design, this cannot be avoided. Sending an Hardware Error seems wrong to me, it would be better to just drop the packet for the (now) invalid HCI connection.
What do you think?
Cheers Matthias
Any news here? The issue about sending an ACL packet with a now invalid HCI Connection handle is independent from the Bluetooth stack and could happen with NimBLE or Bluedroid as well.
@mringwal Sorry for such a delayed reply. The scenario you describe is bound to happen. Thanks a lot for your suggestion. Now I share with you our handling of this situation. Receiving packets from an invalid HCI connection causes "OUT OF SYNC". In a certain controller status, esp32 will response with "HW ERROR". According to Bluetooth Core Spec, "hw error event" is implementation-specific. We will evaluate your suggestion, but I can not guarantee the changes to the code. Thanks again!
@BetterJincheng Thanks for looking at this again. I don't think that emitting a HW Error in this situation because of a race condition is a good user experience :)
If the Bluetooth Controller lib is able to respond with an HW Error Event, it should even be easier to just drop the packet.
@mringwal We have some more information about this issue.
Due to wrong packet sent from HOST (BTStack) to CTRL, CTRL turned into state of "out of sync" and responsed with an EVT of "HW Error" in an ISR context.
In other word, the host_recv_pkt_cb
called from ISR is to notify HOST that downstream packet was wrong (HW Error).
So, the root cause of this issue is the HOST has sent incorrect packet downward.
The information below is helpful to this issue. Please help on it.
Was esp_vhci_host_send_packet
called only in one task context?
Thanks!
@BetterJincheng Thanks for your reply.
A few different topics:
call to host_recv_pkt_cb from ISR:I am still convinced that host_recv_pkt_cb should never be called from ISR. As Espressif has the source code for the Bluetooth Controller, it would be great if all calls to host_recv_pkt_cb could be checked and made sure it's never called from IR.
ACL packet with invalid con handle: Even if the host stack sends an ACL packet with an invalid connection handle. In that case, the Bluetooth Controller could just drop the packet.
I don't have a trace for this, but what about my example from May 2021: e.g. music streaming app continuously sends audio data. At one point, the remote side closes the connection and the Bluetooth Controller emits a HCI Disconnect event. In our implementation, this is queued in a ring buffer and processed the next time the Bluetooth Host thread gets executed. It's unclear to me if this can be avoided by any Bluetooth stack 100%. Here, this would be no issue if the Bluetooth Controller just silently drops the packet.
Thanks
@mringwal excerpted from Core Specification. FYI~
If the UART synchronization is lost in the communication from Host to Controller, then the Controller shall send an HCI_Hardware_Error event to tell the Host about the synchronization error.
@BetterJincheng That's correct. We can handle the HCI Hardware Error as expected (assume Controller has crashed, reset and start over). However, there's no reason that this event is delivered from ISR context.
To clarify, I think a callback should always get called back from the same type of context: either it's always called from a thread context, or, it's always called from ISR context. Otherwise, it's quite challenging to deal with. In any case, it would need to be properly documented.
So, any progress on the two issues:
Thanks!
@mringwal
Hi @BetterJincheng
Thanks for moving HCI Hardware Error into thread context.
I do agree that an invalid con handle might indicate a UART synchronization error.
However, please consider this scenario:
As this is a valid scanario that cannot be avoided, I think it's better to drop the packet instead of sending HW Error.
A compromise could be to remember which connection was active for e.g. 100 ms after it was disconnected. With that, you could distinguish a packet with a invalid / never-used connection handle from one that has just been disconnected.
In any case, it's good if each new connection gets a new connection handle (wraps at 4096). I don't know if that's the case, if not, that would be good to add.
What do you think?
Cheers, Matthias
@mringwal
Hi again. I do agree, there should not be an additional timer.
What about this:
Do you agree that the Controller should not send an HW error in a situation where neither side sent incorrect data? If the Host ist sending incorrect data, there will be other errors, too
@mringwal ACL data on disconnected handle will be ignore by the Contrller which is accessed by 2fa475bc .
Some information needs to be noted:
If the flow control from Host to Controller is enabled, the Host expects to receive an event of NUMBER_OF_COMPLETED_PACKETS when the controller has finished processing a packet.
When the Controller has already sent the event of DISCONNECTION_COMPLETE to the Host, it will no longer report NUMBER_OF_COMPLETED_PACKETS upwards. This means that, The host is responsible for managing flow control related information, such as, credit.
ACL data on disconnected handle will be ignore by the Contrller which is accessed by 2fa475bc .
That's great!
Thanks for reminding on the Host to Host Controller flow control. After receiving a HCI Disconnected Event, we free the HCI Connection structure which includes the counter for the number of sent packets - so that's synchronized then as well.
Environment
git describe --tags
to find it): v4.2 (PlatformIO espressif32@3.0.0)xtensa-esp32-elf-gcc --version
to find it): // 1.22.0-80-g6c4433aProblem Description
https://github.com/bluekitchen/btstack/issues/357 When using the Bluekitchen BTStack as the Bluetooth stack host_recv_pkt_cb is called from interrupt context causing a crash. According to the author of BTStack this is an issue with ESP IDF and they suggested raising an issue here.
Expected Behavior
No crash.
Actual Behavior
abort() was called at PC 0x40082a4a on core 0
Steps to reproduce
Install BTStack and run the example code below. Open Bluetooth settings on iPhone and wait for ESP32 to request bonding, once accepted the ESP32 crashes.
Code to reproduce this issue
Debug Logs
Serial output only shows:
Using EspStackTraceDecoder.jar I was able to recover the following trace: