espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.02k stars 7.12k forks source link

ESP in automatic light sleep has high power consumption (IDFGH-12637) #13634

Open patrickwilliamson1 opened 3 months ago

patrickwilliamson1 commented 3 months ago

Answers checklist.

IDF version.

v4.4.5

Espressif SoC revision.

ESP32-WROOM-32E-N16

Operating System used.

Windows

How did you build your project?

Command line with Make

If you are using Windows, please specify command line type.

None

Development Kit.

custom pcb

Power Supply used.

USB

What is the expected behavior?

I have many esp32's deployed in the field, and every so often we get a customer where the ESP's are deployed on their wifi network, very standard, 2.4GHz only, ubiquity configs below. I expect the power consumption should be pretty uniform over long period of time.

I've not moved to esp 5.0 yet because i don't see much that suggest it is updated there, if there is a specific update made on idf v5 I'm happy to update otherwise i dont like to.

What is the actual behavior?

Every once in a while (maybe 1x per day may 1x per hour, seemingly random) the ESP will go into a state where it is consuming a ton of power. We noticed that we noticed it by a temperature rise on the board. I added some debug in there since it is remote to monitor the task, from the freerots function uxTaskGetSystemState i pasted below, I notice the activity on wifi task comes way down.

Screenshot 2024-04-17 at 10 10 08 AM

Also note, i'm using the automatic light sleep (WIFI_PS_MIN_MODEM), to minimize power consumption with an active mqtt connection (mqtt over wss, 1 publish per min)

esp_mqtt_client_config_t mqtt_cfg = { .uri = CONFIG_BROKER_URL, .event_handle = mqtt_event_handler, .user_context = NULL, .username = get_serno(), .password = read_password(), .use_global_ca_store = false, .crt_bundle_attach = esp_crt_bundle_attach, .client_id = get_serno(), .keepalive = 60, .disable_auto_reconnect = true };

Steps to reproduce.

unsure how to reproduce it locally, it only happens on few wifi networks.

I could port printf to mqtt so i can debug the wifi/ip etc but the issue is I don't think it is a good idea if i turn on all the debug prints because it will massively up my network traffic and from there i'll have a power issue anyway. Are there select prints that i could use to get the key info? or a config that could cause this

Debug Logs.

No response

More Information.

sdkconfig.txt

Dazza0 commented 3 months ago

@patrickwilliamson1 The power surge probably means that the ESP isn't in sleep anymore. This typically indicates that the ESP has some processing to do, thus prevent the power management library from putting the ESP to sleep. Any chance you could find out what tasks are in the ready/running state during the power surge?

igrr commented 3 months ago

Just a tip, enabling CONFIG_PM_PROFILING and using the esp_pm_dump_locks function may help find the specific lock which prevents the system from going back to sleep. It may be the OS (e.g. some tasks are running) or one of the drivers (e.g. Wi-Fi has disconnected, and is trying to scan for network to reconnect back, keeping the radio on for a long time).

patrickwilliamson1 commented 3 months ago

Thanks guys, I will run a log today with these and report back.

patrickwilliamson1 commented 3 months ago

Just a tip, enabling CONFIG_PM_PROFILING and using the esp_pm_dump_locks function may help find the specific lock which prevents the system from going back to sleep. It may be the OS (e.g. some tasks are running) or one of the drivers (e.g. Wi-Fi has disconnected, and is trying to scan for network to reconnect back, keeping the radio on for a long time).

I Ran this and was able to catch another time it happened, I made these to graphs, The other lock counters were consistent slope so i didn't show them. Below i'm showing the time(us) graphed, you can see when the power consumption went up, the wifi lock was held more. At this time, the wifi task from freertos states came down like in my first post

WiFi didnt disconnect and neither did Mqtt in this time (I have counters placed on the disconnect/connect events)

Seems like the wifi lock is holding APB_MAX randomly-> is there anything i can use to figure out why or what debugs could i enable so I can see what causes it? For me it is critical to maintain low power.

image image
esp-lis commented 3 months ago

image @patrickwilliamson1 Can you briefly explain what this chart means?

another question: Do you know what the AP's DTIM parameter is?

patrickwilliamson1 commented 3 months ago

image @patrickwilliamson1 Can you briefly explain what this chart means?

another question: Do you know what the AP's DTIM parameter is?

This is the is the difference of uxTaskGetSystemState output, it measures how many freertos ticks the fucntion is active DTIM is 1 (i've also tried 3 already, same result)

patrickwilliamson1 commented 3 months ago

@igrr Any thoughts on this?

patrickwilliamson1 commented 2 months ago

I ported ESP_Logx to Ram so i could get it remotley and enabled lwip, wifi debug. do you see anthying in the high power usage log that could cause it? normal power usage.txt high power usage.txt

patrickwilliamson1 commented 2 months ago

@igrr @esp-lis what do you think could cause ESP to get stuck holding the APB_MAX lock? is there any other data log i could use to figure it out

yardimli commented 2 months ago

It's possible that there is interference and the ESP needs to use more power on the Wi-Fi, it should be easy to test, block the signal or move the board such that it's harder to connect.

patrickwilliamson1 commented 2 months ago

It's possible that there is interference and the ESP needs to use more power on the Wi-Fi, it should be easy to test, block the signal or move the board such that it's harder to connect.

I've tried this, does not recreate the issue. The devices in the field that experience this have signals ranging between -40 to -60 dB

esp-lis commented 3 weeks ago

I've tried this, does not recreate the issue. The devices in the field that experience this have signals ranging between -40 to -60 dB

image @patrickwilliamson1 referring to the figure above, I guess the moment when the power consumption increases is in the red dotted box, I think there may be upstream data transmission or downstream data reception, which may be triggered by the upper layer protocol, such as lwip or mqtt, which may need to be analyzed by capturing data packets.