Closed rojer closed 1 week ago
here is a typical "good" boot log: https://gist.github.com/rojer/85c3abd0157a5b7935ca7249ff04ea1c and a bad one (this is one, in total i have 3 occurences out of approx 8000 samples so far): https://gist.github.com/rojer/f2adba071ad424167bce5930d44f2f11 things to note:
this seems to suggest that this has been fixed already. can i get a confirmation on whether that is the case?
No, that "duplicate" label is just a bug of our issue tracking system. Somehow two internal issues got created for this GitHub issue; the second one was marked as "duplicate", and the resolution got synced back to this issue. I'll clear the label.
No, that "duplicate" label is just a bug of our issue tracking system. Somehow two internal issues got created for this GitHub issue; the second one was marked as "duplicate", and the resolution got synced back to this issue. I'll clear the label.
The question is this issue is duplicated with which issue? I don't find the link.
@AxelLin it's not duplicated with anything, labeling was a mistake. it's still waiting for investigation.
I can confirm the same issue but on 4.2.4
Exactly same symptoms related to frequent network connection and disconnection event. Happens more frequently when WiFi signal is weak compared to sites with better network coverage.
W (21654) wifi:TX Q not empty: 500, TXQ_BLOCK=0 W (21654) wifi:force witi stop I (21654) wifi:flush txq I (21654) wifi:stop sw txq I (21654) wifi:lmac stop hw txq W (21654) wifi:sw tx 0 state not idle, potential error! W (21654) wifi:force sw tx 0 state to idle, ebuf flag=0
esp_restart() does not help - only power cycle.
I can confirm the same issue but on 4.2.4
v4.2 is already EOL, maybe you can try with newer esp-idf version?
v4.4 has been tried as well. Same issue. The biggest problem is that esp_restart does not help and the only solution is to power cycle device manually - very problematic for our customers.
No, that "duplicate" label is just a bug of our issue tracking system. Somehow two internal issues got created for this GitHub issue; the second one was marked as "duplicate", and the resolution got synced back to this issue. I'll clear the label.
This issue has been reported for several months, but there is no response from other espressif developers?
hi @rojer , I have seen your log and I would like to confirm that if you have used the BLE and Wi-Fi coex? and could you share the minimum reproduce code to us
hi @lszelejewski could you describe your application scenario in details
On behalf of @lszelejewski I will describe our problem. Majority of our devices uses v4.2.4. We found this issue because due to our bug the device frees already freed pointer and calls esp_restart just after dropped wifi and reconnecting. We see
CORRUPT HEAP: multi_heap.c:540 detected at 0x3ffbb094
abort() was called at PC 0x40094971 on core 0
before the reboot.
But the problem is that WiFi can't connect anymore after startup. Software watchdog resets the device with esp_restart but issue persists until power cycle. We can recreate this error only with weak wifi signal when connection is dropped from time to time so our bug corrupts the heap.
So it looks like this:
The log:
E (1894) UDP Server: Socket created
I (1894) UDP Server: Socket bound, port 1234
I (1894) UDP Server: Waiting for data
I (1984) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (1984) wifi:state: init -> auth (b0)
I (2384) WATCHDOG: Subscribe watchdog to task esp_timer
I (2984) wifi:state: auth -> init (200)
I (2984) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
W (2984) WiFi: wifi event handler 5
I (2984) WiFi: Retry to connect to the AP
I (2994) EventHandlers: NETWORKING event ID 1
I (4294) wifi:new:<6,0>, old:<1,0>, ap:<255,255>, sta:<6,0>, prof:1
I (4294) wifi:state: init -> auth (b0)
01:44:06.678 I (5294) wifi:state: auth -> init (200)
I (5294) wifi:new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1
W (5294) WiFi: wifi event handler 5
I (5294) WiFi: Retry to connect to the AP
I (5294) EventHandlers: NETWORKING event ID 1
01:44:08.036 I (7704) wifi:new:<11,0>, old:<6,0>, ap:<255,255>, sta:<11,0>, prof:1
I (7704) wifi:state: init -> auth (b0)
01:44:09.395 I (8704) wifi:state: auth -> init (200)
I (8704) wifi:new:<11,0>, old:<11,0>, ap:<255,255>, sta:<11,0>, prof:1
W (8704) WiFi: wifi event handler 5
I (8714) WiFi: Retry to connect to the AP
I (8714) EventHandlers: NETWORKING event ID 1
01:44:12.419 W (12074) WiFi: wifi event handler 5
I (12074) WiFi: Retry to connect to the AP
I (12074) EventHandlers: NETWORKING event ID 1
01:44:13.783 I (12124) wifi:new:<1,0>, old:<11,0>, ap:<255,255>, sta:<1,0>, prof:1
I (12124) wifi:state: init -> auth (b0)
I (13124) wifi:state: auth -> init (200)
I (13124) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
W (13124) WiFi: wifi event handler 5
I (13124) WiFi: Retry to connect to the AP
I (13124) EventHandlers: NETWORKING event ID 1
01:44:15.135 I (14424) wifi:new:<6,0>, old:<1,0>, ap:<255,255>, sta:<6,0>, prof:1
I (14434) wifi:state: init -> auth (b0)
01:44:16.494 I (15434) wifi:state: auth -> init (200)
I (15434) wifi:new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1
W (15434) WiFi: wifi event handler 5
I (15434) WiFi: Retry to connect to the AP
I (15434) EventHandlers: NETWORKING event ID 1
W (15674) wifi:m f probe req l=0
W (15914) wifi:m f probe req l=0
W (16154) wifi:m f probe req l=0
01:44:17.846 W (16394) wifi:m f probe req l=0
W (16634) wifi:m f probe req l=0
W (16874) wifi:m f probe req l=0
W (17114) wifi:m f probe req l=0
W (17354) wifi:m f probe req l=0
01:44:19.192 W (17604) wifi:m f probe req l=0
W (17844) wifi:m f probe req l=0
I (17944) wifi:new:<11,0>, old:<6,0>, ap:<255,255>, sta:<11,0>, prof:1
I (17944) wifi:state: init -> auth (b0)
01:44:20.545 I (18944) wifi:state: auth -> init (200)
I (18944) wifi:new:<11,0>, old:<11,0>, ap:<255,255>, sta:<11,0>, prof:1
W (18944) WiFi: wifi event handler 5
W (18944) WiFi: Using fallback network
01:44:24.286 W (23954) wifi:TX Q not empty: 500, TXQ_BLOCK=0
W (23954) wifi:force witi stop
I (23954) wifi:flush txq
I (23954) wifi:stop sw txq
I (23954) wifi:lmac stop hw txq
W (23954) wifi:sw tx 0 state not idle, potential error!
W (23954) wifi:force sw tx 0 state to idle, ebuf flag=412
It looks like inconsistent state is kept during reset with esp_restart.
Is there a change that triggering hardware watchdog would reset WiFi better than esp_restart()?
hi @tomasznowik from your log, there might be somerhing wrong with phy or the Phy uses the wrong calibration data , could you tell me 1.how much device have this problem 2.how often the problem recurs and tell me the reproduce steps if it is convient. 3.provide your sdkconfig.
@Xiehanxin We still observe the issue. After fixing our own bug the device doesn't reset so often so it looked like the issue is solved but it's not. After writing a configuration to flash we call esp_system_abort to reboot. And rarely the devices can't connect to wifi anymore because of the error I described previously.
W 19585 | wifi force witi stop
I 19585 | wifi flush txq
I 19585 | wifi stop sw txq
I 19585 | wifi lmac stop hw txq
W 19585 | wifi sw tx 0 state not idle, potential error!
W 19585 | wifi force sw tx 0 state to idle, ebuf flag=412
You wrote "there might be somerhing wrong with phy or the Phy uses the wrong calibration data". Is there anything we could try to fix this? We use:
CONFIG_ESP32_PHY_CALIBRATION_AND_DATA_STORAGE=y
# CONFIG_ESP32_PHY_INIT_DATA_IN_PARTITION is not set
CONFIG_ESP32_PHY_MAX_WIFI_TX_POWER=20
CONFIG_ESP32_PHY_MAX_TX_POWER=20
@Xiehanxin Any further feedback about https://github.com/espressif/esp-idf/issues/11060#issuecomment-1669941226 and https://github.com/espressif/esp-idf/issues/11060#issuecomment-1807573122 ?
Hi @rojer , we find the cause of Wi-Fi connect issue after reset when coexisting with BLE scan. https://github.com/espressif/esp-idf/issues/13598#issuecomment-2188088182
If its similiar in your case, you can try to verify it on v5.2 as well.
@Espressif-liuuuu great news, thanks! we've been carrying a really hacky workarond for this for a while. i'll try to repro it with our current code (w/o workaround) and then try the fix.
i'll try to repro it with our current code (w/o workaround) and then try the fix.
@rojer Does the fix work for you?
Thanks for reporting, will close due to short of feedback, feel free to reopen with more updates. Thanks for using our Espressif product!
yeah, i'm sorry - i have actually tried to reproduce it, but wasn't able to. something has changed somewhere, but the bottom line is we don't see these crashes anymore, and that's good.
Answers checklist.
IDF version.
4.4 release, tried latest 4.4 latest as well
Operating System used.
Linux
How did you build your project?
Command line with Make
If you are using Windows, please specify command line type.
None
Development Kit.
ESP32-PICO-KIT
Power Supply used.
USB
What is the expected behavior?
Wifi connects every time
What is the actual behavior?
Sometimes (approximately 1 in hundreds to a thousand) the wifi stack gets into an inconsistent state and is unable to connect or scan properly. The tell-tale sign is appearance of
W (67573) wifi:m f probe req l=0
messages on the console, and additionally, wifi can no longer be properly stopped, after 5 seconds of blocking it prints the following:restarting does not solve the issue, and deinit causes a core dump due to what looks like a double-free on some internal buffer:
Steps to reproduce.
I have firmware that reboots 5 seconds after wifi connects and it will get into this state within an hour or two.
Debug Logs.
No response
More Information.
No response