bdring / FluidNC

The next generation of motion control firmware
Other
1.52k stars 371 forks source link

Problem: ESP32, WIFI on causes consumption jump from 0.07A to 0.13A and overhearts #1262

Open aguaviva opened 1 month ago

aguaviva commented 1 month ago

Wiki Search Terms

heat, power consumption, esp32

Controller Board

none, just the ESP32

Machine Description

nothing

Input Circuits

No response

Configuration file

default

Startup Messages

[MSG:INFO: FluidNC v3.7.18 https://github.com/bdring/FluidNC]
[MSG:INFO: Compiled with ESP32 SDK:v4.4.7-dirty]
[MSG:INFO: Local filesystem type is spiffs]
[MSG:ERR: Cannot open configuration file:config.yaml]
[MSG:INFO: Using default configuration]
[MSG:INFO: Axes: using defaults]
[MSG:INFO: Machine Default (Test Drive)]
[MSG:INFO: Board None]
[MSG:INFO: Stepping:RMT Pulse:4us Dsbl Delay:0us Dir Delay:0us Idle Delay:255ms]
[MSG:INFO: Axis count 3]
[MSG:INFO: Axis X (-1000.000,0.000)]
[MSG:INFO:   Motor0]
[MSG:INFO: Axis Y (-1000.000,0.000)]
[MSG:INFO:   Motor0]
[MSG:INFO: Axis Z (-1000.000,0.000)]
[MSG:INFO:   Motor0]
[MSG:INFO: Kinematic system: Cartesian]
[MSG:INFO: STA SSID is not set]
[MSG:INFO: AP SSID FluidNC IP 192.168.0.1 mask 255.255.255.0 channel 1]
[MSG:INFO: AP started]
[MSG:INFO: WiFi on]
[MSG:INFO: Captive Portal Started]
[MSG:INFO: HTTP started on port 80]
[MSG:INFO: Telnet started on port 23]

Grbl 3.7 [FluidNC v3.7.18 (wifi) '$' for help]
[MSG:ERR: Configuration is invalid. Check boot messages for ERR's.]

User Interface Software

serial port

What happened?

Esp32 consumes

Note that HTTP and Telnet are turned off all the time.

Is this expected?

GCode File

No response

Other Information

No response

MitchBradley commented 1 month ago

Ask Espressif to make a radio that can transmit without using power.

aguaviva commented 1 month ago

Ask Espressif to make a radio that can transmit without using power.

It shouldn't be transmitting anything right after flashing it and after turning HTTP & Telnet off.

Moreover I flashed CircuitPython (WIFI on + a webserver running) consumes 50mA, so there might be something in the firmware that is making consumption skyrocket.

I can repro the 130mA with circuitpython if I transmit 1000 packets. So definitely there must be something in the firmware that is attempting to transmit something somewhere?

MitchBradley commented 1 month ago

It could be scanning. I have no idea what mode CircuitPython is using. See https://deepbluembedded.com/esp32-sleep-modes-power-consumption/ . If you want to figure out what exactly is happening in your situation, you can modify WifiConfig.cpp to test various modes and setup steps.

MitchBradley commented 1 month ago

Since it is in AP mode it is probably sending beacons.

aguaviva commented 1 month ago

Circuitpython is also in AP mode and sending beacons, plus it has a webserver with websockets server running in the background. And it is not in any sleep mode as it is responsive via WIFI and serial port.

BTW my situation is just a bare ESP32 ( I tried two units and both showing the same heating/consumption issue with HTTP/Telnet off) so potentially more people might be seeing this same issue.

I don't know the code well enough but it would help to have some debugging commands that turn off certain parts of the code in a divide and conquer fashion, that should help corner down where the issue is.

BTW do you have a quick way to check the consumption of your ESP32 modules?

V1EngineeringInc commented 1 month ago

Ask Espressif to make a radio that can transmit without using power.

I have actually had two users both using genuine ESP32 modules in the last week ask why their esp's are now showing 80C temps. Might be worth taking a second look at. One of them has two esp's and only one shows the issue so I was not sure what to think about it but after reading this post I am wondering if it is getting hit a little harder than before?

aguaviva commented 1 month ago

I'll be happy to help debug the issue and try instrumented builds that help narrow down the issue (but cant instrument the code myself as I am not familiar with it and would need some time to learn it.)

MitchBradley commented 1 month ago

Mine is running at 88F not C in AP mode with the Test Drive config file - or with a valid one. Barely above ambient. http/enable and telnet/enable off or on, doesn't matter. Same with 3.7.18 and 3.8.0

V1EngineeringInc commented 1 month ago

Thanks for double-checking.

I will keep at it on my end then.

MitchBradley commented 1 month ago

I just did a full install on a bare ESP32 with an attached external antenna. 84F according to an infrared thermometer. Unplugged the antenna - still cold.

The wifi startup code is in WebUI/WifiConfig.cpp in the StartAP() method. If someone wants to comment out parts of it to see what happens, go wild. I don't have time to add a bunch of debugging stuff and document it and build a test release and ... Setting up a compilation environment is pretty easy. Install VsCode and the PlatformIO extension, start a project by git cloning the FluidNC repo, and hit the upload icon at the bottom of the vscode window.

V1EngineeringInc commented 1 month ago

I think your tests, and the test I have seen from my user that have two different ESP's acting differently, are telling me it is most likely some sort of hardware issue. Mine do not seem to get that hot and the ones I flash and ship have all seemed good for the short time they are on for testing.

aguaviva commented 1 month ago

Setting up a compilation environment is pretty easy. Install VsCode and the PlatformIO extension, start a project by git cloning the FluidNC repo, and hit the upload icon at the bottom of the vscode window.

That is the easy part, the hard part is to know what line to comment out. I'd be great to have some sort of debugging register that helps disable parts of the code so issues can be cornered down easily. For example 0 might mean all enabled and as you increase the value features would start to get disabled. This might require 10 lines of code or so, the tricky part is where to put them.

Are your Esp32 consuming 50mA or 130mA?

MitchBradley commented 1 month ago

130mA would be over 400 mW at 3.3V which would at least be warm to the touch.

The lines of code starting with WIFI. are the interesting ones.

aguaviva commented 1 month ago

I think your tests, and the test I have seen from my user that have two different ESP's acting differently, are telling me it is most likely some sort of hardware issue.

I am not sure it is a HW issue as Circuitpython works well when flashed.

MitchBradley commented 1 month ago

The ESP32 hides some radio-related stuff in a section of FLASH. It controls what the radio does right after startup. Maybe there is something in there that is causing a problem. You could try doing an erase and full install. Perhaps CircuitPython left something lying around that plays badly with FluidNC, but works well with CircuitPython.

MitchBradley commented 1 month ago

It might not be a "hardware" issue exactly, but it could be in some of the semi-magic Espressif bootloader and FLASH layout stuff. That would fall into a gray area where it is not strictly hardware, but neither is it a problem in the FluidNC code per-se. There is a lot going on behind the scenes with the Espressif startup code. When you do a fresh install on a virgin ESP32, it tends to work right, to the extent that it works for hundreds or thousands of users without complaint. One thing that is unusual about your chips is that they have experienced a CircuitPython installation. I am not sure that is actually what is causing the problem, but it is certainly something that is not the norm.

aguaviva commented 1 month ago

I did a fresh install of your firmware in a new Esp32, later is when I installed Circuitpython to determine whether this was a HW or SW issue.

I know I keep asking the same but If you could list a few lines to comment out that would help.

MitchBradley commented 1 month ago

The lines that start with WIFI. as I said before.

aguaviva commented 1 month ago

I tried disabling almost everything and can't get rid of the 130mA.

I was about to start thinking this is a HW issue when I decided to run https://github.com/espressif/arduino-esp32/blob/master/tests/validation/wifi/wifi.ino and got 2~4mA again.

I used the following platformo.ini, same board & platform as yours

[env:esp_wroom_02]
platform = https://github.com/platformio/platform-espressif32.git
board = esp32dev
framework = arduino
MitchBradley commented 1 month ago

Try a noradio build

MitchBradley commented 1 month ago

What kind of ESP32 module is it?

aguaviva commented 1 month ago

Sorry, my bad, I meant that as soon as I call WiFi.mode(WIFI_STA); I get the 130mA, even if I commented out most of the stuff. When I comment out that line, it goes down to 5mA.

MitchBradley commented 1 month ago

If you set CORE_DEBUG_LEVEL to 5 in platformio.ini, you will get a lot of messages from the underlying stack. Sometimes that provides a clue.

aguaviva commented 1 month ago

I found the issue, it is this line https://github.com/MitchBradley/FluidNC/blob/7deec3327ff4339dae7e989729a34632de228b1d/FluidNC/src/WebUI/WifiConfig.cpp#L785 what is causing the esp32 to draw so much current

MitchBradley commented 1 month ago

You should probably be looking at the code in the bdring FluidNC repo. My fork is not maintained; it is only present for some old experiments. Regardless, here are the possible values for that function:

typedef enum {
    WIFI_PS_NONE,        /**< No power save */
    WIFI_PS_MIN_MODEM,   /**< Minimum modem power saving. In this mode, station wakes up to receive beacon every DTIM period */
    WIFI_PS_MAX_MODEM,   /**< Maximum modem power saving. In this mode, interval to receive beacons is determined by the listen_interval parameter in wifi_sta_config_t */
} wifi_ps_type_t;

I don't know the performance impact of the various values.

I do know why that value WIFI_PS_NONE was chosen. It was because of this ESP32 issue with this effect on FluidNC

If you search the web for "ESP32 GPIO36" you will find much discussion of problems surrounding the use of GPIOs 36 and 39 when WiFi is enabled and in a power save mode. It is possible that the problems occur mainly when interrupts are enabled for those pins. We no longer use interrupts for limit pins, instead using a fast polling routine - since ESP32 GPIO interrupts are prone to strange problems especially related to bus conflicts between the two CPU cores. It could be that the new polling strategy suppresses the problem - but it is also possible that it does not, because if the WiFi system causes those pins to pulse active at random times, it is hard to guarantee that they won't read active at the time the poller happens to look at them.

aguaviva commented 1 month ago

thanks, I was using bdring, I just happend to send your repo's link.

It would help if that particular line had a comment documenting

bdring commented 1 month ago

The line was probably set that way because we generally don't need power savings as the ESP32 is not run off batteries.

I have never seen overheating. We could possibly a new NVS setting like $Wifi/Powermode

V1EngineeringInc commented 1 month ago

Not all of us understand programming like you do. That seems like a very odd response from someone that understands ever single line in great detail and knows why he chose that exact option and didn't break it out.

I privately asked someone I know about this very thing, "should we have control over this?" His response was "I am not sure, it might help it might not"

MitchBradley commented 1 month ago

The point is that programming is very time consuming and documentation is even more so. You have to draw the line somewhere. Explaining things in excruciating detail quickly becomes counterproductive, especially in light of the fact that few people read the code, and commentary often becomes stale, incorrect, and thus of negative value. Git makes it possible to trace the history of lines of code and often to correlate them to specific issues and pull requests, which serves the purpose better than trying to summarize the situation in few enough lines of commentary so as not to obscure the flow of the code with chattiness.

bdring commented 1 month ago

I tried a new setting. $Wifi/PsMode={NONE,MIN,MAX}, to test the power consumption. I used an easy, but low quality meter.

image

Here are the values I found on a raw chip with a default config.

$Wifi/mode=AP Max= 0.11 to 0.14 (toggles slowly) Min = 0.11 to 0.14 (toggles quickly) None=0.14

$Wifi/Mode=STA Max=0.04 Min=0.04 None=0.11