Closed jimmo closed 4 years ago
I'm not 100% sure if this is related, but it occurs in bluedroid too, if you have many results, the scan stuck after a while, so a workaround is to scans a short period of time (1 minute), stop some time (10 secs) and back to scan again instead try scan it forever
Hi @jimmo I am trying to reproduce this issue with the above mentioned settings using blecent
example from $IDF_PATH/examples/bluetooth/nimble/blecent
, unfortunately even after stressing it for >48Hrs I have not yet been able to see the crash.
Using NimBLE, running a ble_gap_disc, the stack eventually stops generating new scan results (i.e. the ble_gap_event_fn passed to ble_gap_disc no longer gets called).
Can you please confirm if you have not set disc_params.filter_duplicates = 1
?
If I stop scanning (i.e. ble_gap_disc_cancel), then start again, it fails with:
Can you please tell me where exactly you are calling ble_gap_dic_cancel
?
The main MicroPython task is blocked on the UART (for the REPL) via ulTaskNotifyTake.
I guess this might be differentiating factor between our setup, I will try to replicate this at my end and update you.
@prasad-alatkar Thanks for looking into this!
Can you please confirm if you have not set
disc_params.filter_duplicates = 1
?
We set filter_duplicates
to 0
.
Can you please tell me where exactly you are calling ble_gap_dic_cancel?
We call cancel from the main MicroPython task.
To elaborate -- the MicroPython firmware gives me a REPL prompt on the device, from which I call a Python function which internally calls ble_gap_disc
, with interval and window both set to 30ms.
When the disc callback is invoked, it copies the data into a ringbuffer and notifies the MicroPython task, which then schedules a Python callback that prints out the scan data. (When I referred to making the callback a no-op, this is the code I'm talking about)
After some time interval (usually 10-15 mins in my testing at ~50 results/s, but I've had reports from users that this can take many hours at a slower rate), I stop seeing the print output from my Python callback.
So that's where I interactively stop the scan from the REPL, then re-start it, at which point I see the crash described in rwble.c.
The main MicroPython task is created from app_main().
xTaskCreatePinnedToCore(mp_task, "mp_task", MP_TASK_STACK_LEN, NULL, MP_TASK_PRIORITY, &mp_main_task_handle, MP_TASK_COREID);
MP_TASK_COREID
is 1
. We also pin NimBLE to core 1.
Hi @jimmo, apologies for late reply. FYI, I was able to reproduce the issue with high advertisement rate. Though I am still looking into the root-cause, can you try disabling Bluetooth Modem Sleep
? From menuconfig just follow this path : Component config > Bluetooth > Bluetooth controller > MODEM SLEEP Options
. Alternatively you can comment out CONFIG_BTDM_MODEM_SLEEP
and CONFIG_BTDM_MODEM_SLEEP_MODE_ORIG
from your sdkconfig.
Please share your valuable observations.
@prasad-alatkar Great that you were able to repro the issue, and thanks again for looking into it!
As you can probably see in our sdkconfig.h, we're using the default value of the sleep params (FYI MicroPython still uses tools/kconfig_new/confgen.py, so I just added CONFIG_BTDM_MODEM_SLEEP=n
to our input config https://github.com/micropython/micropython/blob/master/ports/esp32/boards/sdkconfig.ble). Confirmed that this removed those two lines from the generated sdkconfig.h
Unfortunately I was still able to repro the issue. I actually got it to happen after a very short amount of time (less than a minute), and then again after about 15 minutes.
hi @prasad-alatkar, as @jimmo mentioned, this seems very similar to my issue #4001 which includes a coredump of the esp_32 when the error occurs which may give you greater insight into the operation of the BT system when it fails. If you need any further information please let me know.
Hi @jimmo @0neblock apologies for delayed response. @csushantk is debugging this issue internally. I will update you once the issue is fixed.
Hi @prasad-alatkar I noticed that a few commits were pushed to the Github repo recently, can you please confirm if your fix for this issue was included?
I am tracking release/v3.3 for my IDF toolchain. I am testing this latest release now. Thanks.
Hi @0neblock the issue is not yet fixed, I will update you as soon as it is fixed.
@prasad-alatkar Apologies for pushing again, but is there any news here? Any workaround is highly appreciated.
Hi @pschlang , @csushantk is working on fixing the issue in controller. I am also working on workarounds in host, however it may take some time to fix this issue. I will update you as soon as it is fixed.
Hi @prasad-alatkar, we also observe this issue in 4.0-rc1. We urgently need a workaround for this. Can you give some more details on how to avoid this issue or how we could detect and recover from it?
+1 if a workaround exists, that would be great until we have a solid fix.
@prasad-alatkar Any news regarding the fix?
Hi @dmartauz Can you please try and test on latest master ? commits 48bd2d74b4209bbad7c29d2d856810ee43a1a92c
and 74c7db3f49e8605d7ec3736a3ba508a9f8d3fd9a
are expected to resolve the crash issue. Please let me know if you still face the issue .
@prasad-alatkar Great, thanks. I will test in a week or two.
@prasad-alatkar is this excepted to fix bluedroid as well?
@prasad-alatkar thanks for this. When you say "are expected to resolve the crash issue." does that mean that your previously-failing test case now works?
Any chance you'd be able to back-port to v4.0? I don't have the time right now to move MicroPython to v4.1-beta1 (and we'd rather stick to the release version anyway). A few people have contacted us asking about this issue.
Hi @jimmo , the changes are backported to v4.0 as well.
When you say "are expected to resolve the crash issue." does that mean that your previously-failing test case now works?
These fixes address the bugs that were found while debugging this issue. Please let me know if it resolves your issue.
@0neblock sorry for delayed response. As the fix is in controller, I believe it will be applicable to bluedroid as well (CC: @csushantk )
Thanks @prasad-alatkar, Do you know when/if these changes would be backported to v3.3?
@prasad-alatkar ah I was looking out for a new tag, but it seems that you have just moved the existing v4.0 tag? That's going to create some confusion. Can you do point releases in the future?
Thanks for the confirmation, I will try and get some people to test it ASAP. I can confirm that MicroPython builds cleanly against the new v4.0 revision tag, I will update here when I hear back from our users.
@0neblock actually the changes are backported to release/v3.3
as well. @jimmo by v4.0
I was referring to release/v4.0
, sorry if there was any confusions because of this.
Hi @prasad-alatkar, the esp32-bt-lib On release/v3.3 hasn’t been updated for a month, is the fix definitely in there ?
Hi @0neblock I think commit 72fb22ed
has the corresponding fix (cc: @csushantk @chinmaychhajed )
by
v4.0
I was referring torelease/v4.0
, sorry if there was any confusions because of this.
The issue is that it's now not clear what IDF 4.0 is?
@Alvin1Zhang Can you please comment on @jimmo 's concerns ? @jimmo as per my understanding, "Manually checking out the release/v4.0 branch" should be ok, however Alvin can better guide you first 2 points.
Hi @jimmo,
That's right, we don't move the v4.0
release tag. When @prasad-alatkar talks about "backported to v4.0", he means that the fix was merged into the release/v4.0
branch and will be part of the v4.0.1
bugfix release once that's available.
We use these "release branches" to maintain bugfix updates for each major & minor version, so in this case release/v4.0
is the branch that contains the v4.0 tag, and future commits on this branch will be tagged for v4.0.1, v4.0.2, etc.
This page in the ESP-IDF docs has more details about our versioning scheme: https://docs.espressif.com/projects/esp-idf/en/stable/versions.html
At this stage, if you plan to stick with stable releases then the only option is to wait for the v4.0.1 release. However if you want to check out the latest commit on the release/v4.0 branch then you can test this fix before it's released.
Thanks for the clarification @projectgus -- makes sense!
Hi @prasad-alatkar, I'm not sure how everyone else has gone with the testing, but in the limited testing i've been able to do with the current tip of release/v3.3 I am experiencing the same issue. I am not able to check with other branches of the idf for now. Do you have any further confirmation that this should be fixed in v3.3?
Hi @0neblock as per as NimBLE host is concerned, issue ASSERT_PARAM(64 0), in rwble.c
is fixed. The controller has fix for ASSERT_PARAM(131072 0)
(CC: @Sushant-Espressif ).
I believe you are getting this issue with Bluedroid stack and is related to https://github.com/espressif/esp-idf/issues/4001.
I will try to reproduce issue(#4001) and revert back if I find anything. Just to be sure, Can you confirm that your application just does scanning and not advertising? Can you please also try to disable CONFIG_BLE_ADV_REPORT_FLOW_CONTROL_SUPPORTED
and see if you still face the issue ?
Hi @0neblock commit: 55877684 fixes ASSERT_PARAM(64 0)
issue. Could you please check if the issue gets resolved with latest master branch ?
Edit: Sorry, the fix is not yet available on github. However Issue has been fixed and merged into internal repo, will update you once it is available on github.
I will try to reproduce issue(#4001) and revert back if I find anything. Just to be sure, Can you confirm that your application just does scanning and not advertising? Can you please also try to disable
CONFIG_BLE_ADV_REPORT_FLOW_CONTROL_SUPPORTED
and see if you still face the issue ?
Sorry @prasad-alatkar for my delay, I can confirm that disabling 'CONFIG_BLE_ADV_REPORT_FLOW_CONTROL_SUPPORTED' does not fix the issue in #4001, anything else i can try?
Thanks for reporting and sorry for slow turnaround, the fix is available at https://github.com/espressif/esp-idf/commit/2f6f842d0f2a0e57541ab8a71a1ad5ba382a956c. Feel free to reopen if the issue still happens. Thanks.
Hello @prasad-alatkar ,
I'm having the same problem of @0neblock (issue #4001 ) using ESP-IDF v4.3-dev-907-g6c17e3a64. There is any workaround or fix available for this esp-idf version?
Regards,
Gianluca.
Environment
git describe --tags
to find it): v4.0-beta1, v4.1-devxtensa-esp32-elf-gcc --version
to find it): (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 5.2.0Problem Description
Using NimBLE, running a ble_gap_disc, the stack eventually stops generating new scan results (i.e. the ble_gap_event_fn passed to ble_gap_disc no longer gets called).
If I stop scanning (i.e. ble_gap_disc_cancel), then start again, it fails with:
I have been investigating this issue that was reported by MicroPython users. The typical time to failure was several hours, but I was able to repro it much faster with some devices sending advertisements at a high rate (50 per second). Usually takes about 10-15 mins for the issue to occur.
The most important detail about my configuration is that it's running inside MicroPython, but I can repro this if I turn gap_scan_cb into a no-op, so that should hopefully exclude any issues with what we're doing in the callback. No other BLE operations were active. It's a passive scan, with interval=30ms, window=30ms, duration=forever. Another detail is that NimBLE is pinned to core 1 (same core as the MicroPython task), but I saw this problem when it was on the default core (0). The main MicroPython task is blocked on the UART (for the REPL) via ulTaskNotifyTake.
MicroPython doesn't use IDF's makefiles or cmake, but it does use sdkconfig. I've attached the input and generated sdkconfig.h below.
I wonder if this might be related to https://github.com/espressif/esp-idf/issues/4001 (but that was on Bluedroid, not NimBLE). I can repro with both v4.0-beta1 and v4.1-dev.
Here's a disassembly of
r_rwble_isr
which seems to be raising the assert above https://gist.github.com/jimmo/43ba1da440fcbebfab89c70e345f368b (I've marked the relevant line with ****) I've attached the elf below. Built from https://github.com/jimmo/micropython/tree/ble-fixesOther items if possible
application.elf.zip sdkconfig.zip