espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.29k stars 7.2k forks source link

Bluetooth crash after +-1hour (IDFGH-12209) #13262

Open SimonPVS opened 6 months ago

SimonPVS commented 6 months ago

Answers checklist.

IDF version.

idf 4.3.1

Espressif SoC revision.

ESP32

Operating System used.

Windows

How did you build your project?

Eclipse IDE

If you are using Windows, please specify command line type.

CMD

Development Kit.

Custom board with esp32

Power Supply used.

External 3.3V

What is the expected behavior?

Bluetooth should always work with all sorts of clients

What is the actual behavior?

After 1 hour bluetooth crashes with recent android or IOS models.

Steps to reproduce.

  1. Start from the a2dp sink example
  2. Connect a samsung tablet Tab S6 with android 12
  3. Play music from spotify for about one hour

Debug Logs.

Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x40101b13  PS      : 0x00060f30  A0      : 0x80100e98  A1      : 0x3ffe67f0
0x40101b13: reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:175
 (inlined by) reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:136

A2      : 0x3fffdf90  A3      : 0x000002ab  A4      : 0x000003cd  A5      : 0x00000080
A6      : 0x00001100  A7      : 0x00000000  A8      : 0x80101b10  A9      : 0x3ffe67a0
A10     : 0x00000000  A11     : 0x000003d5  A12     : 0x00000000  A13     : 0x00000018
A14     : 0x00000020  A15     : 0x00000024  SAR     : 0x0000001c  EXCCAUSE: 0x0000001d
EXCVADDR: 0x00000002  LBEG    : 0x4000c46c  LEND    : 0x4000c477  LCOUNT  : 0x00000000

Backtrace:0x40101b10:0x3ffe67f0 0x40100e95:0x3ffe6820 0x4011974a:0x3ffe6840 0x40112de7:0x3ffe6870 0x40119658:0x3ffe6890 0x401136cf:0x3ffe68b0
0x40101b10: reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:174
 (inlined by) reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:136

0x40100e95: hal_says_packet_ready at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/hci_layer.c:413

0x4011974a: hci_hal_h4_hdl_rx_packet at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/hci_hal_h4.c:296
 (inlined by) event_uart_has_bytes at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/hci_hal_h4.c:304

0x40112de7: fixed_queue_process at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/common/osi/fixed_queue.c:254

0x40119658: hci_hal_h4_rx_handler at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/hci_hal_h4.c:170

0x401136cf: osi_thread_run at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/common/osi/thread.c:68

More Information.

With an older samsung tablet or a windows PC, the problem did not seem to occure

shangke1112 commented 6 months ago

@SimonPVS Please upload the elf file with this issue and show me the git commit id of IDF.

SimonPVS commented 6 months ago

NWPxxx.zip

git commit id: SHA-1: 2e74914051d14ec2290fc751a8486fb51d73a31e

shangke1112 commented 6 months ago

Are you sure this elf was used when an error occurred, after disassembling, it was found that the code address does not match .

With this log ,I think there may be a error on packet_fragmenter.c:174 " partial_packet == NULL"

The code does not handle malloc failures.

SimonPVS commented 6 months ago

Yess i copied the file from the build folder. Maybe something went wrong with zipping it, so I renamed the extension to txt so if you can rename it back to .elf, it should'nt be modified. (github does not allow me to send a .elf file) NWPxxx.txt

Indeed I noticed there is a osi_calloc function which i think is failing. The partial packet is NULL with a requested length of 973. (I don't know if this is any usefull info for you)

image

What can be the reason of failing there?

I also noticed this "reassemble_and_dispatch" function is called depending on the client device which is connected and playing music. An older android version does not call this function while the android 12 tablet does calls this function a lot

shangke1112 commented 6 months ago

I downloaded "NWPxxx.tx" , but it is still not match. After you modify any line of code, rebuilding will change the elf file, what I need is the elf file when running the error.

StoreProhibited means store a prohibited address,

0x40101b10: reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:174

Only when partial_packet == "a prohibited address " will this error be reported on the line 174. But I don't have the elf file, so I can't confirm how it went wrong.

SimonPVS commented 6 months ago

Just to be sure, you mean bij elf file the one in the build folder, right? Because that is the one I have send you. This is the build used for producing the error.

What do you mean by it is not a match?

And further i have found that the osi_calloc failed, or the hash_map_get? Is all freed as it should be? Do I need to do something special for fragmenting?

shangke1112 commented 6 months ago

image

xtensa-esp32-elf-objdump -S NWPxxx.elf > NWPxxx.S You will find that 0x40101b10 is not a right address of an instruction. The elf file is right ,but do not match the log .

Rebuild the code ,download the bin file to your device and save the elf file at the same time ;then run your device, this log and elf file will be match.

If the osi_calloc failed ,I think we need to check the allocation of the memory .For example ,it there a memory leak or any momory that is not needed?

SimonPVS commented 6 months ago

Ok sorry for the misunderstanding.

NWPxxx.zip

`I (2213203) BT_APP_CORE: bt_app_work_dispatch event 0x4, param len 12 I (2213203) BT_AV: Play position changed: 50048-ms partial_packet == NULL, full length: 973 Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.

Core 0 register dump: PC : 0x40101b34 PS : 0x00060d30 A0 : 0x80100ea4 A1 : 0x3ffe66f0 0x40101b34: reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:185 (inlined by) reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:136 A2 : 0x3ffffcdc A3 : 0x00000000 A4 : 0x000003cd A5 : 0x00000080 A6 : 0x00001100 A7 : 0x00000000 A8 : 0x80101b2e A9 : 0x3ffffce4 A10 : 0x0000002a A11 : 0x3ffe689c A12 : 0x00000000 A13 : 0x00000018 A14 : 0x00000020 A15 : 0x00000024 SAR : 0x00000004 EXCCAUSE: 0x0000001d EXCVADDR: 0x00000002 LBEG : 0x4000c46c LEND : 0x4000c477 LCOUNT : 0x00000000

Backtrace:0x40101b31:0x3ffe66f0 0x40100ea1:0x3ffe6720 0x4011976a:0x3ffe6740 0x40112e07:0x3ffe6770 0x40119678:0x3ffe6790 0x401136ef:0x3ffe67b0 0x40101b31: reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:184 (inlined by) reassemble_and_dispatch at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/packet_fragmenter.c:136 0x40100ea1: hal_says_packet_ready at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/hci_layer.c:413

0x4011976a: hci_hal_h4_hdl_rx_packet at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/hci_hal_h4.c:296 (inlined by) event_uart_has_bytes at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/hci_hal_h4.c:304

0x40112e07: fixed_queue_process at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/common/osi/fixed_queue.c:254

0x40119678: hci_hal_h4_rx_handler at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/host/bluedroid/hci/hci_hal_h4.c:170

0x401136ef: osi_thread_run at C:/Users/simon/esp/esp-idf-v4.3.1/components/bt/common/osi/thread.c:68`

SimonPVS commented 6 months ago

Is it normal that i recieve this debug message at a very high rate when streaming music from an android 12 tablet? image

shangke1112 commented 6 months ago

This is normal. When receiving data from CTRL, it is subcontracted and packaged , all applications will use it here.

shangke1112 commented 6 months ago

image

partial_packet == NULL, full length: 973

I think you can check for any memory leak .

Also, I suggest you update the IDF version to see if there is still the issue.

image

SimonPVS commented 6 months ago

Thanks for the support!

I have updated to 4.4 (not to 5.x because of the breaking changes) The music is playing stable now for more than 2 hours and still going. Now the heap decreases much slower, but unfortunatly it keeps decreasing. Is this normal?

Directly after connecting the heap goes to 42k. After 2 hours the heap is at 37k. (and I think it keeps decreasing further)

note: when i pause a song for a couple of seconds, the heap jumps back to 42k

shangke1112 commented 6 months ago

It seems that there is a memory leak. Please refer to : https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/heap_debug.html#how-to-diagnose-memory-leaks