Open danergo opened 3 years ago
Note: this is the root-cause of https://github.com/espressif/esp-idf/issues/7723.
Anybody? Please help me guys. This bug is in your closed software part!
Hi @danergo
Thank you very much for providing such detailed analysis. We will further analyze this scenario to confirm whether it is a problem and give you a reply as soon as possible.
Hi @xiewenxiang: could you check this one, by any chance yet?
Thank you!
Hi @danergo, a fix for this issue is done and waiting for approval.
Hi @BetterJincheng: do you see by any chance how much time will this take? Or how much time an approval takes usually?
Thank you
@danergo, it is hard to evaluate... If needed, I can provide you the fixed lib in advance. Let me know
@BetterJincheng: that sounds perfect, please do that and I can continue the testing. Thank you! You can also send it to danergo@gmail.com if that suits better.
libbtdm_app.zip @danergo
esp-idf/component/bt/controller/lib_esp32/esp32/libbtdm_app.a
with the attachment.idf.py fullclean
(or just rm -rf ./build
)Let me know if it works.
@BetterJincheng:
Sorry for delays, I just had time to validate your fix now: it seems working, I'll run my usual testing method for a couple of more days and report back.
Thanks a lot!
Bad news:
After several days of constant stress-test (HCI LE Advertisement scanning, and some GATT commands, and also some raw classic BT stuff) and ESP32 crashed:
[2021-11-19 16:19:05] Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
[2021-11-19 16:19:05]
[2021-11-19 16:19:05] Core 0 register dump:
[2021-11-19 16:19:05] PC : 0x400def1d PS : 0x00060130 A0 : 0x80049b9c A1 : 0x3ffcc750
[2021-11-19 16:19:05] A2 : 0x07ffffff A3 : 0x02fa9bd0 A4 : 0x0000000b A5 : 0x00000001
[2021-11-19 16:19:05] A6 : 0x3ffcc7b0 A7 : 0x00000000 A8 : 0x3ffc9324 A9 : 0x3ffcc730
[2021-11-19 16:19:05] A10 : 0x0030bb00 A11 : 0x00000001 A12 : 0x00000000 A13 : 0x3ffafd68
[2021-11-19 16:19:05] A14 : 0x00000000 A15 : 0x3ffb8360 SAR : 0x00000016 EXCCAUSE: 0x0000001c
[2021-11-19 16:19:05] EXCVADDR: 0x00000008 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0x00000000
[2021-11-19 16:19:05]
[2021-11-19 16:19:05]
[2021-11-19 16:19:05] Backtrace:0x400def1a:0x3ffcc750 0x40049b99:0x3ffcc7b0 0x400457cd:0x3ffcc7e0 0x4008483c:0x3ffcc820 0x40019d11:0x3ffcc850 0x40055b4d:0x3ffcc870 0x400d5f98:0x3ffcc890 0x400d65b8:0x3ffcc8b0 0x40092951:0x3ffcc8e0
I decoded the backtrace but it is not too verbose:
0x400def1a: r_lld_evt_update_create at ??:?
0x40049b99: ?? ??:0
0x400457cd: ?? ??:0
0x4008483c: ke_task_schedule at ke_task.c:?
0x40019d11: ?? ??:0
0x40055b4d: ?? ??:0
0x400d5f98: r_rw_schedule at ??:?
0x400d65b8: btdm_controller_task at ??:?
0x40092951: vPortTaskWrapper at /home/user/esp/esp-idf/components/freertos/port/xtensa/port.c:130
That line in port.c
: pxCode(pvParameters);
Update After a short while another crash arrived:
[2021-11-19 16:39:25] ASSERT_ERR(lm_env.local_name), in lm.c at line 79
[2021-11-19 16:39:25] Guru Meditation Error: Core 0 panic'ed (IllegalInstruction). Exception was unhandled.
[2021-11-19 16:39:25] Memory dump at 0x4008ee00: f01d020c 00004136 f01d0000
[2021-11-19 16:39:25] Core 0 register dump:
[2021-11-19 16:39:25] PC : 0x4008ee07 PS : 0x00060130 A0 : 0x80083725 A1 : 0x3ffcc760
[2021-11-19 16:39:25] A2 : 0x3f40491c A3 : 0x00000000 A4 : 0x00000000 A5 : 0x3f404831
[2021-11-19 16:39:25] A6 : 0x0000004f A7 : 0xfffffffd A8 : 0x8000814b A9 : 0x3ffcc6d0
[2021-11-19 16:39:25] A10 : 0x00000000 A11 : 0x3ffcc6f2 A12 : 0x3ffcc69f A13 : 0x00000039
[2021-11-19 16:39:25] A14 : 0x00000000 A15 : 0x3ffcc6a3 SAR : 0x00000004 EXCCAUSE: 0x00000000
[2021-11-19 16:39:25] EXCVADDR: 0x00000000 LBEG : 0x40083699 LEND : 0x400836a1 LCOUNT : 0x00000000
[2021-11-19 16:39:25]
[2021-11-19 16:39:25]
[2021-11-19 16:39:25] Backtrace:0x4008ee04:0x3ffcc760 0x40083722:0x3ffcc780 0x400e556e:0x3ffcc7a0 0x4002509e:0x3ffcc7c0 0x400da89a:0x3ffcc800 0x4008483c:0x3ffcc820 0x40019d11:0x3ffcc850 0x40055b4d:0x3ffcc870 0x400d5f98:0x3ffcc890 0x400d65b8:0x3ffcc8b0 0x40092951:0x3ffcc8e0
Decoded:
0x4008ee04: r_assert at /home/user/esp/esp-idf/components/bt/controller/esp32/bt.c:1991
0x40083722: r_assert_err at ??:?
0x400e556e: r_LM_GetLocalNameSeg at ??:?
0x4002509e: ?? ??:0
0x400da89a: lc_lmp_rx_handler at ??:?
0x4008483c: ke_task_schedule at ke_task.c:?
0x40019d11: ?? ??:0
0x40055b4d: ?? ??:0
0x400d5f98: r_rw_schedule at ??:?
0x400d65b8: btdm_controller_task at ??:?
0x40092951: vPortTaskWrapper at /home/user/esp/esp-idf/components/freertos/port/xtensa/port.c:130
Rest is similar to previous.
Questions: 1.) Why are almost all lines not having line numbers? 2.) Can I change anything to have more info from this backtrace? (Or I can share my binary for you in case it helps).
Thank you!
Can you have a look at it? It's very annoying and I have to do hciconfig hci0 reset
once every 2-3 days constantly.
Thank you.
Now ESP32 printed "scan_evt timeout", and I also did HAVE TO do hciconfig hci0 reset
in order to use it. It's very unstable.
Bad news:
After several days of constant stress-test (HCI LE Advertisement scanning, and some GATT commands, and also some raw classic BT stuff) and ESP32 crashed:
[2021-11-19 16:19:05] Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled. [2021-11-19 16:19:05] [2021-11-19 16:19:05] Core 0 register dump: [2021-11-19 16:19:05] PC : 0x400def1d PS : 0x00060130 A0 : 0x80049b9c A1 : 0x3ffcc750 [2021-11-19 16:19:05] A2 : 0x07ffffff A3 : 0x02fa9bd0 A4 : 0x0000000b A5 : 0x00000001 [2021-11-19 16:19:05] A6 : 0x3ffcc7b0 A7 : 0x00000000 A8 : 0x3ffc9324 A9 : 0x3ffcc730 [2021-11-19 16:19:05] A10 : 0x0030bb00 A11 : 0x00000001 A12 : 0x00000000 A13 : 0x3ffafd68 [2021-11-19 16:19:05] A14 : 0x00000000 A15 : 0x3ffb8360 SAR : 0x00000016 EXCCAUSE: 0x0000001c [2021-11-19 16:19:05] EXCVADDR: 0x00000008 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0x00000000 [2021-11-19 16:19:05] [2021-11-19 16:19:05] [2021-11-19 16:19:05] Backtrace:0x400def1a:0x3ffcc750 0x40049b99:0x3ffcc7b0 0x400457cd:0x3ffcc7e0 0x4008483c:0x3ffcc820 0x40019d11:0x3ffcc850 0x40055b4d:0x3ffcc870 0x400d5f98:0x3ffcc890 0x400d65b8:0x3ffcc8b0 0x40092951:0x3ffcc8e0
I decoded the backtrace but it is not too verbose:
0x400def1a: r_lld_evt_update_create at ??:? 0x40049b99: ?? ??:0 0x400457cd: ?? ??:0 0x4008483c: ke_task_schedule at ke_task.c:? 0x40019d11: ?? ??:0 0x40055b4d: ?? ??:0 0x400d5f98: r_rw_schedule at ??:? 0x400d65b8: btdm_controller_task at ??:? 0x40092951: vPortTaskWrapper at /home/user/esp/esp-idf/components/freertos/port/xtensa/port.c:130
That line in
port.c
:pxCode(pvParameters);
Update After a short while another crash arrived:
[2021-11-19 16:39:25] ASSERT_ERR(lm_env.local_name), in lm.c at line 79 [2021-11-19 16:39:25] Guru Meditation Error: Core 0 panic'ed (IllegalInstruction). Exception was unhandled. [2021-11-19 16:39:25] Memory dump at 0x4008ee00: f01d020c 00004136 f01d0000 [2021-11-19 16:39:25] Core 0 register dump: [2021-11-19 16:39:25] PC : 0x4008ee07 PS : 0x00060130 A0 : 0x80083725 A1 : 0x3ffcc760 [2021-11-19 16:39:25] A2 : 0x3f40491c A3 : 0x00000000 A4 : 0x00000000 A5 : 0x3f404831 [2021-11-19 16:39:25] A6 : 0x0000004f A7 : 0xfffffffd A8 : 0x8000814b A9 : 0x3ffcc6d0 [2021-11-19 16:39:25] A10 : 0x00000000 A11 : 0x3ffcc6f2 A12 : 0x3ffcc69f A13 : 0x00000039 [2021-11-19 16:39:25] A14 : 0x00000000 A15 : 0x3ffcc6a3 SAR : 0x00000004 EXCCAUSE: 0x00000000 [2021-11-19 16:39:25] EXCVADDR: 0x00000000 LBEG : 0x40083699 LEND : 0x400836a1 LCOUNT : 0x00000000 [2021-11-19 16:39:25] [2021-11-19 16:39:25] [2021-11-19 16:39:25] Backtrace:0x4008ee04:0x3ffcc760 0x40083722:0x3ffcc780 0x400e556e:0x3ffcc7a0 0x4002509e:0x3ffcc7c0 0x400da89a:0x3ffcc800 0x4008483c:0x3ffcc820 0x40019d11:0x3ffcc850 0x40055b4d:0x3ffcc870 0x400d5f98:0x3ffcc890 0x400d65b8:0x3ffcc8b0 0x40092951:0x3ffcc8e0
Decoded:
0x4008ee04: r_assert at /home/user/esp/esp-idf/components/bt/controller/esp32/bt.c:1991 0x40083722: r_assert_err at ??:? 0x400e556e: r_LM_GetLocalNameSeg at ??:? 0x4002509e: ?? ??:0 0x400da89a: lc_lmp_rx_handler at ??:? 0x4008483c: ke_task_schedule at ke_task.c:? 0x40019d11: ?? ??:0 0x40055b4d: ?? ??:0 0x400d5f98: r_rw_schedule at ??:? 0x400d65b8: btdm_controller_task at ??:? 0x40092951: vPortTaskWrapper at /home/user/esp/esp-idf/components/freertos/port/xtensa/port.c:130
Rest is similar to previous.
Questions: 1.) Why are almost all lines not having line numbers? 2.) Can I change anything to have more info from this backtrace? (Or I can share my binary for you in case it helps).
Thank you!
The elf file will do help. please attach here. Thanks
libbtdm_app.zip @danergo Please try this lib to verify whether it fixed the second problem. Thanks
@BetterJincheng: just to be on the same page, which is the second for you?
The one with "IllegalInstruction"?
Thank you!
@BetterJincheng: I have not yet updated the libbtdm_app, but I get another crash, now with backtrace corrupted.
This is the only path which give some lights:
ASSERT_PARAM(2048 0), in rwbt.c at line 381
I hope you can take a look into this also.
@BetterJincheng:
With your latest version, I'm getting a lot of error messages in dmesg:
[ 9392.795165] Bluetooth: hci0: command 0x200e tx timeout
[ 9414.395439] Bluetooth: hci0: command 0x200d tx timeout
[ 9420.251516] Bluetooth: hci0: command 0x1003 tx timeout
[ 9422.267543] Bluetooth: hci0: command 0x1001 tx timeout
[ 9422.386796] Bluetooth: hci0: unexpected event for opcode 0x200d
[ 9422.387385] Bluetooth: hci0: unexpected event for opcode 0x0c03
[ 9422.430117] Bluetooth: hci0: hardware error 0x00
[ 9422.432685] Bluetooth: hci0: unexpected event for opcode 0x0c0d
[ 9429.435631] Bluetooth: hci0: command 0x1003 tx timeout
[ 9431.451666] Bluetooth: hci0: command 0x1001 tx timeout
[ 9433.467680] Bluetooth: hci0: command 0x1009 tx timeout
[ 9442.555808] Bluetooth: hci0: command 0x1003 tx timeout
[ 9444.571824] Bluetooth: hci0: command 0x1001 tx timeout
[ 9446.587851] Bluetooth: hci0: command 0x1009 tx timeout
[ 9455.612041] Bluetooth: hci0: command 0x1003 tx timeout
[ 9457.627993] Bluetooth: hci0: command 0x1001 tx timeout
[ 9459.644091] Bluetooth: hci0: command 0x1009 tx timeout
[ 9468.668156] Bluetooth: hci0: command 0x1003 tx timeout
[ 9470.684187] Bluetooth: hci0: command 0x1001 tx timeout
[ 9472.700199] Bluetooth: hci0: command 0x1009 tx timeout
[ 9481.724306] Bluetooth: hci0: command 0x1003 tx timeout
[ 9483.740321] Bluetooth: hci0: command 0x1001 tx timeout
[ 9485.756347] Bluetooth: hci0: command 0x1009 tx timeout
[ 9502.108567] Bluetooth: hci0: command 0x1003 tx timeout
[ 9504.124582] Bluetooth: hci0: command 0x1001 tx timeout
[ 9506.140609] Bluetooth: hci0: command 0x1009 tx timeout
[ 9517.116769] Bluetooth: hci0: command 0x1003 tx timeout
[ 9519.132773] Bluetooth: hci0: command 0x1001 tx timeout
[ 9521.148808] Bluetooth: hci0: command 0x1009 tx timeout
Can you please look into it soon? I'm waiting for this fix since last year. Thank you.
Ping :)
I have more logs now:
With your latest version of library you sent, I got this:
< HCI Command: Create Connection (0x01|0x0005) plen 13 #4450 [hci0] 11:21:47.231569
Address: AA:BB:CC:DD:EE:FF (Liteon Technology Corporation)
Packet type: 0xcc18
DM1 may be used
DH1 may be used
DM3 may be used
DH3 may be used
DM5 may be used
DH5 may be used
Page scan repetition mode: R2 (0x02)
Page scan mode: Mandatory (0x00)
Clock offset: 0x0000
Role switch: Allow slave (0x01)
> HCI Event: Command Status (0x0f) plen 4 #4451 [hci0] 11:21:47.234813
Create Connection (0x01|0x0005) ncmd 5
Status: Success (0x00)
< HCI Command: Create Connection Cancel (0x01|0x0008) plen 6 #4452 [hci0] 11:21:50.236441
Address: AA:BB:CC:DD:EE:FF (Liteon Technology Corporation)
> HCI Event: Command Complete (0x0e) plen 10 #4453 [hci0] 11:21:50.238166
Create Connection Cancel (0x01|0x0008) ncmd 5
Status: ACL Connection Already Exists (0x0b)
Address: AA:BB:CC:DD:EE:FF (Liteon Technology Corporation)
Now, the problem with this is when I want to cancel the connection in step #4452 (Create Connection Cancel
), I got the reply in #4453 (Command Complete
), that ACL Connection Already Exists
.
This breaks Bluetooth's specification, as if "ACL Connection" is already existing, why haven't you sent any Connect Complete
HCI event?
It would be great if I can get any feedback or reply to this thread.
ping
< HCI Command: LE Create Connection (0x08|0x000d) plen 25 #202072 [hci0] 13:57:10.737769
Scan interval: 60.000 msec (0x0060)
Scan window: 60.000 msec (0x0060)
Filter policy: White list is not used (0x00)
Peer address type: Public (0x00)
Peer address: AA:BB:CC:DD:EE:FF (OUI AA-BB-CC)
Own address type: Public (0x00)
Min connection interval: 25.00 msec (0x0014)
Max connection interval: 25.00 msec (0x0014)
Connection latency: 0 (0x0000)
Supervision timeout: 5000 msec (0x01f4)
Min connection length: 0.000 msec (0x0000)
Max connection length: 0.000 msec (0x0000)
< HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0 #202073 [hci0] 13:57:14.604338
@ MGMT Event: Connect Failed (0x000d) plen 8 {0x0002} [hci0] 13:57:20.738171
LE Address: AA:BB:CC:DD:EE:FF (OUI AA-BB-CC)
Status: Not Connected (0x02)
Another violation of Bluetooth standard by your closed source stuff. No response to my "LE Create Connection Cancel".
And only after exactly 10 seconds comes the management event that connection is failed.
1. How can I change this 10 second time?
2. Please attach me a new build with debug log.
@BetterJincheng: can you please look into this?
Thank you!
< HCI Command: LE Create Connection (0x08|0x000d) plen 25 #42191 [hci0] 06:35:04.599612
Scan interval: 60.000 msec (0x0060)
Scan window: 60.000 msec (0x0060)
Filter policy: White list is not used (0x00)
Peer address type: Public (0x00)
Peer address: AA:BB:CC:DD:EE:FF (OUI AA-BB-CC)
Own address type: Public (0x00)
Min connection interval: 25.00 msec (0x0014)
Max connection interval: 25.00 msec (0x0014)
Connection latency: 0 (0x0000)
Supervision timeout: 5000 msec (0x01f4)
Min connection length: 0.000 msec (0x0000)
Max connection length: 0.000 msec (0x0000)
< HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0 #42192 [hci0] 06:35:07.272523
< HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0 #42193 [hci0] 06:35:12.277259
> HCI Event: Command Status (0x0f) plen 4 #42194 [hci0] 06:35:14.598817
LE Create Connection (0x08|0x000d) ncmd 3
Status: Success (0x00)
> HCI Event: Command Complete (0x0e) plen 4 #42195 [hci0] 06:35:14.599115
LE Create Connection Cancel (0x08|0x000e) ncmd 4
Status: Success (0x00)
How can I cancel a(n) (LE) Create Connection before receiving its "Command Complete" event, 10 seconds later? @BetterJincheng: could you please help me?
@xiewenxiang: I have provided many more logs for your modified closed-source library.
I haven't received any feedback or support from your side since last November which is now 3 months ago.
This is really annoying and unfair.
ping
ping
ping
I have more logs now:
With your latest version of library you sent, I got this:
< HCI Command: Create Connection (0x01|0x0005) plen 13 #4450 [hci0] 11:21:47.231569 Address: AA:BB:CC:DD:EE:FF (Liteon Technology Corporation) Packet type: 0xcc18 DM1 may be used DH1 may be used DM3 may be used DH3 may be used DM5 may be used DH5 may be used Page scan repetition mode: R2 (0x02) Page scan mode: Mandatory (0x00) Clock offset: 0x0000 Role switch: Allow slave (0x01) > HCI Event: Command Status (0x0f) plen 4 #4451 [hci0] 11:21:47.234813 Create Connection (0x01|0x0005) ncmd 5 Status: Success (0x00) < HCI Command: Create Connection Cancel (0x01|0x0008) plen 6 #4452 [hci0] 11:21:50.236441 Address: AA:BB:CC:DD:EE:FF (Liteon Technology Corporation) > HCI Event: Command Complete (0x0e) plen 10 #4453 [hci0] 11:21:50.238166 Create Connection Cancel (0x01|0x0008) ncmd 5 Status: ACL Connection Already Exists (0x0b) Address: AA:BB:CC:DD:EE:FF (Liteon Technology Corporation)
Now, the problem with this is when I want to cancel the connection in step #4452 (
Create Connection Cancel
), I got the reply in #4453 (Command Complete
), thatACL Connection Already Exists
.This breaks Bluetooth's specification, as if "ACL Connection" is already existing, why haven't you sent any
Connect Complete
HCI event?
Sorry for delay. And I will now try my best to help you solve the problem.
A question -> From 11:21:47.234813
to 11:21:50.236441
, why hasn't the connection been established for a full three seconds? I do not know if there is something wrong with your application.
Hi, @BetterJincheng:
Connection establishment event should be coming from your library. Thats exactly the problem that this event is not coming out from your closed source library.
My application can do nothing with this. I'm still using your hci_uart_sample: it does only forwarding commands from my Linux box to your closed source library.
So if connection established event is no coming back, it means your closed source library is not handling it properly, but I can't say any more as I have no access to that library.
This way, it seems it breaks the Bluetooth Core Specification.
What we can try is to use some true verbose logging, which can help detection of this problem.
Hi @danergo,
But I really can not reproduce your issue. Here is my log. Wait for your reply.
< HCI Command: Create Connection (0x01|0x0005) plen 13 #2663 [hci0] 1670.775888
Address: AA:BB:CC:DD:EE:FF (OUI AA-BB-CC)
Packet type: 0xcc18
DM1 may be used
DH1 may be used
DM3 may be used
DH3 may be used
DM5 may be used
DH5 may be used
Page scan repetition mode: R2 (0x02)
Page scan mode: Mandatory (0x00)
Clock offset: 0x0000
Role switch: Allow slave (0x01)
> HCI Event: Command Status (0x0f) plen 4 #2664 [hci0] 1670.779305
Create Connection (0x01|0x0005) ncmd 5
Status: Success (0x00)
< HCI Command: Create Connection Cancel (0x01|0x0008) plen 6 #2665 [hci0] 1673.204458
Address: AA:BB:CC:DD:EE:FF (OUI AA-BB-CC)
> HCI Event: Command Complete (0x0e) plen 10 #2666 [hci0] 1673.207398
Create Connection Cancel (0x01|0x0008) ncmd 5
Status: Success (0x00)
Address: AA:BB:CC:DD:EE:FF (OUI AA-BB-CC)
> HCI Event: Connect Complete (0x03) plen 11 #2667 [hci0] 1673.207400
Status: Unknown Connection Identifier (0x02)
Handle: 65535
Address: AA:BB:CC:DD:EE:FF (OUI AA-BB-CC)
Link type: ACL (0x01)
Encryption: Disabled (0x00)
Yes, this is the normal case, how it should work. I also have this 9/10 times (or 95/100 times).
However in some cases I got the Status: ACL Connection Already Exists (0x0b)
for my Create Connection Cancel
command.
Anyway, I'm trying right now with a different firmware (from Zephyr).
With the new libbtdm_app
library, I got bunch of undefined symbols, can you help me resolving these please:
These are all undefined references, during link time:
coex_ble_adv_priority_high_set
ld_acl_evt_canceled_cbk
ld_acl_evt_start_cbk
ld_acl_evt_stop_cbk
lc_pwr_max_ind_handler
lc_pwr_incr_ind_handler
lc_pwr_decr_ind_handler
lc_acl_disc_ind_handler
Thank you
Please do not use the lib I provided on Nov 29, 2021 as it was just a tester. Use the original lib on branch v4.4-dev-2533-g83956ebbae
.
I used a simple script to test these two command and did not get Status: ACL Connection Already Exists (0x0b)
. Any suggestions?
#!/bin/bash
for i in $(seq 1 100)
do
# Create Connection
sudo hcitool -i hci0 cmd 0x01 0x0005 0xff 0xee 0xdd 0xcc 0xbb 0xaa 0x18 0xcc 0x02 0x00 0x00 0x00 0x01;
sleep 2s;
# Create Connection Cancel
sudo hcitool -i hci0 cmd 0x01 0x0008 0xff 0xee 0xdd 0xcc 0xbb 0xaa;
sleep 1s;
done
Any suggestions?
You can try running this script parallelly, 2-3 instances, and let's see. It's kind of a "load-test", but still, if connection cancel reports that ACL connection exists, we should get notified about that. I'll also try reproducing it.
Use the original lib on branch
v4.4-dev-2533-g83956ebbae
Ah, okay! And does this one contain your patch too?
Thank you!
Ping
@BetterJincheng: can you tell me now please which branch contains these or this patch(es)? Release is already at version 4.4, so it is containing this, or no? Thank you.
@danergo patch for what?
@BetterJincheng: you have sent me a "test" library on this post:
libbtdm_app.zip @danergo Please try this lib to verify whether it fixed the second problem. Thanks
The question is: which release branch is this merged to?
Thank you!
@danergo That lib is just for test. You had reported many more issues with the lib. We have not merged it into any release branch.
Due to the covid-19 pneumonia, we are working in our home now. We can not access the lib code until we get back to office.
@BetterJincheng: ah I see. Thanks for confirming.
So currently, the best to use this library is to clone the release version4.4 and replace the library inside?
@BetterJincheng ping
Which is the EARLIEST branch contains these fixes? Please let me know asap. Thank you. It doesnt have to be release branch.
Do you have some better support channel? Like discord? Im getting sick on this issue since last October. Thank you.
As I said, that lib was just a test. Since you reported more problems based on that lib, we will not merge that lib. Now, all release branches have fixed the original problem of 'Create Connection Cancel'. Since the lib I provided you for testing purposes has caused more problems, I suggest you never use that lib again. As for the remaining problems (0x0b), we will solve them as soon as we return to the office. Now, what we can do is very limited.
Ah, okay, I got your point now. Indeed checking out a 4.4.1 release branch checks out a submodule of libesp32, which seems to have your fix for the initial problem of 'Create Connection Cancel'.
I will retest this now, as there was a major issue on the serial port which ESP32 used - to be fair, it could cause package loss. Now, the serial port issue has been fixed, and release branch has the fix for the 'Create Connection Cancel' problem.
Thank you very much, I'll feedback here once I have some news, I'm having high hopes now on closing this issue soon.
Best wishes for you!
Environment
Development Kit: NodeMCU-ESP-32S Module or chip used: ESP-WROOM-32 IDF version (run git describe --tags to find it): v4.4-dev-2533-g83956ebbae Build System: idf.py Compiler version (run xtensa-esp32-elf-gcc --version to find it): xtensa-esp32-elf-gcc (crosstool-NG esp-2021r1) 8.4.0 Operating System: Linux Using an IDE?: No Power Supply: external 5V
Problem Description
I'm using ESP32 as a Bluetooth controller with the ESP-IDF provided hciuart sample: https://github.com/espressif/esp-idf/tree/master/examples/bluetooth/hci/controller_hci_uart_esp32, and BlueZ as host.
Whenever my application tries to establish a connection by sending a Create Connection, and I abort it early by sending a Create Connection Cancel HCI request, ESP32 won't respond to the Create Connection, which blocks BlueZ from sending out any other new Create Connection (to the same peer).
Only way to escape from this is to completely reset the HCI device.
Expected Behavior && Actual Behavior && Steps to reproduce
I have collected logs from a correctly working (Cypress) device to make it easier for you to fix this. I'm using here a fake Bluetooth peer address to make 100% sure it is not pingable and for also making sure this problem doesn't depend at all on any particular peer device.
1st image demonstrates the Page Timeout in case I don't early-abort the connection trial. Left side is from Cypress, right side is ESP32:
2nd image demonstrates the early-aborted connection trial after the 1st Page Timeout, without resetting the HCI device. Still left side is Cypress, right is ESP32. Problem starts manifesting here: while Cypress responds to the Create Connection Cancel request with also sending "Connect Complete", ESP32 does not completes the connection which confuses BlueZ (will think ESP is still trying to connect):
3rd image demonstrates the complete mess, because you haven't closed the earlier connection by responding Connect Complete, BlueZ doesn't send the new Create Connection request at all (It still "thinks" ESP is trying to fulfill the first request):
Due to this, whenever you try connecting to a device which accidentally or intentionally having its Bluetooth disabled, you will not be able to connect to it even after its Bluetooth gets re-enabled.