Open ghost opened 4 years ago
Thanks for reporting, we will look into.
Hi @pbruynbroeckQuimesis please help to provide following info:
Hello @liuzfesp,
In this project, I have the periodic_task
. This task does different things including:
esp_ble_gap_start_scanning
every 30sec (if there is no device connected),esp_wifi_scan_start
,esp_wifi_scan_get_ap_records
, esp_wifi_set_config
,esp_wifi_connect
every 10sec (if there is no connection),esp_ble_gap_disconnect
if no datas are exchange on the BLE for a while (see in the log: W (4558923) HUB_MANAGER: The bracelet 0 no longer send data for 13520 ms. Disconnect..
).All theses features works fine in a normal case.
(We see in the log that the perdioc_task
is also freeze, that's why we have no more attempt to reconnect to the wifi).
However, I thinks the root cause is linked to the queues used by btc_thread and btu_thread.
What can block the processing of theses queues? What start by fill up the btc queue, and only when the btc queue is full, start to use the btu queue?
Unfortunately, it's a big project and it will take a lot of time to simplify this project and create additional scripts to simulate interactions with other devices. And also, I have the feeling that the issue occurs more often when all functionalities are working in same times (play music, wifi, BLE, mqtt and uart communication).
Hi, @pbruynbroeckQuimesis
Maybe you can increase the queue of btc and btu according to your actual needs.
If there are a large number of adv packets in the environment during ble scan, it may cause btc queue to block. You can speed up the execution speed of ble_gap_cb and reduce the time of ble scan.
If you still can't solve the problem, I will provide a patch based on your commit, When the problem occurs, the content of btc queue and btu queue will be typed out to see what data causes a sudden increase
Looking at the log, this looks very similar to the problem I'm reporting in https://github.com/espressif/esp-idf/issues/5351#issuecomment-657337155. Everything is fine until randomly a wifi:bcn_timout,ap_probe_send_start
with 200 beacon timeout, and the device never recovers until a reboot.
Hi, @pbruynbroeckQuimesis
- Maybe you can increase the queue of btc and btu according to your actual needs.
- If there are a large number of adv packets in the environment during ble scan, it may cause btc queue to block. You can speed up the execution speed of ble_gap_cb and reduce the time of ble scan.
- If you still can't solve the problem, I will provide a patch based on your commit, When the problem occurs, the content of btc queue and btu queue will be typed out to see what data causes a sudden increase
Hi @WCCWCC , When the queue btc/btu started to fill up, the ESP32 was working normally. It was connected to my other device via BLE. (The ESP32 wasn't in 'scanning mode').
Thanks @gadget-man, Do you have also the btc and btu queue size increasing just before disconnection?
(To display the size of theses queue, I use osi_thread_queue_wait_size
inside btc_transfer_context
and btu_task_post
bt_status_t btc_transfer_context(btc_msg_t *msg, void *arg, int arg_len, btc_arg_deep_copy_t copy_func)
{
static int queue_size = 0;
static int prev_queue_size = 0;
queue_size = osi_thread_queue_wait_size(btc_thread, 0);
if(queue_size > prev_queue_size)
{
ESP_LOGI("BTC", "btc_thread waterlevel %d", queue_size);
prev_queue_size = queue_size;
}
...
)
Thanks @gadget-man, Do you have also the btc and btu queue size increasing just before disconnection?
(To display the size of theses queue, I use
osi_thread_queue_wait_size
insidebtc_transfer_context
andbtu_task_post
bt_status_t btc_transfer_context(btc_msg_t *msg, void *arg, int arg_len, btc_arg_deep_copy_t copy_func) { static int queue_size = 0; static int prev_queue_size = 0; queue_size = osi_thread_queue_wait_size(btc_thread, 0); if(queue_size > prev_queue_size) { ESP_LOGI("BTC", "btc_thread waterlevel %d", queue_size); prev_queue_size = queue_size; } ...
)
I have Bluetooth disabled, is this relevant in my case?
I thought that was because btu / btc queues was full that I have wifi: bcn_timout, ap_probe_send_start
but in fact not..
I see in my log:
I (4528853) BTC: btc_thread waterlevel 118
I (4528853) BTC: btc_thread waterlevel 119
I (4528873) wifi:bcn_timout,ap_probe_send_start
I (4528893) BTC: btc_thread waterlevel 120
I (4528933) BTC: btc_thread waterlevel 121
[...]
I (4541963) BTU: btu_thread waterlevel 249
I (4541993) BTU: btu_thread waterlevel 250
I (4542713) wifi:bcn_timout,ap_probe_send_start
I (4545233) wifi:ap_probe_send over, resett wifi status to disassoc
I (4545233) wifi:state: run -> init (c800)
I (4545243) wifi:pm stop, total sleep time: 2863614744 us / 4524970362 us
I (4545243) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (4545253) WIFI-Manager: SYSTEM_EVENT_STA_DISCONNECTED: 200 (WIFI_REASON_BEACON_TIMEOUT)
@pbruynbroeckQuimesis BT queue full is not expected to happen. Is this issue only happens when BT queue size full? Can you post the code that lead to BT queue full?
There're some ways to do flow control in application side (you can check the BLE throughput examples) to prevent queue full.
As I said, the ESP works fine: it receives data from BLE and send it over MQTT. (In the same time, it reads data from memory to play music with i2s, it communicates with another device with UART,... so the load is heavy but all works fine.)
But, after a while (here: 4528853 ticks => 75 min), btu/btc queues starts to fill up but without any reason... so I'm not able to find the code that lead to BT queue full.
This is why I asked in https://github.com/espressif/esp-idf/issues/5551#issuecomment-656529024:
What can block the processing of theses queues? What start by fill up the btc queue, and only when the btc queue is full, start to use the btu queue?
Was this ever resolved? I am seeing the exact same behaviour with IDF v4.3.2. Bluetooth off. Nothing intensive going on. Every few weeks, it just disconnects on certain networks with error 200 (WIFI_REASON_BEACON_TIMEOUT
) ! These are networks with multiple APs if that's relevant. Also potentially relevant, esp_wifi_set_ps(WIFI_PS_MAX_MODEM)
... I noticed that the network was falling out during OTA until I set esp_wifi_set_ps(WIFI_PS_NONE);
just before the intensive flash erase etc begun. With this setting it runs fine... So, WIFI_PS_MAX_MODEM is clearly not reliable.
hi @monkeytronics , esp_wifi_set_ps is used to configure whether to keep the RF on during the interval between two beacon intervals, so if use WIFI_PS_NONE it can keep the RF on all the time and if the wifi environment is bad it can enhance the wifi performance.
hi @monkeytronics , esp_wifi_set_ps is used to configure whether to keep the RF on during the interval between two beacon intervals, so if use WIFI_PS_NONE it can keep the RF on all the time and if the wifi environment is bad it can enhance the wifi performance.
@Xiehanxin I don't get your point. The bug report here is not about performance. Since esp-idf supports setting WIFI_PS_MIN_MODEM and WIFI_PS_MAX_MODEM, they should work as well.
Environment
Problem Description
On the same project as that described here: #5122.
After a while, the queue of the
btc_thread
start to fill up. When the queue is full, the queue of thebtu_thread
start to fill up...Then, I have a wifi timeout disconnection (
WIFI_REASON_BEACON_TIMEOUT
) and it's impossible to reconnect to the wifi (WIFI_REASON_AUTH_EXPIRE
).Some tasks seems to freeze: hciT, btuT, BTC_TASK, sys_evt, esp_timer, ipc0 and ipc1.
log_work_queue_250.log
Note:
osi_thread_queue_wait_size(btc_thread, 0);
andosi_thread_queue_wait_size(btu_thread, 0);
.DEFAULT_WORK_QUEUE_CAPACITY
has been increase to 250 (in stead of 100). (https://github.com/espressif/esp-idf/issues/5122#issuecomment-620978546)