espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.72k stars 7.43k forks source link

ESP32 BLE/WiFi throws an error [E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes" - still not resolved #6129

Closed sudheeshsmadhav closed 1 year ago

sudheeshsmadhav commented 2 years ago

Board

ESP32 devkit v4

Device Description

I am using ESP32 devkit v4

Hardware Configuration

no GPIO connections, ESP32 scans the BLE tags and sends their id and rssi values over mqtt

Version

latest master

IDE Name

Platform IO also uses arduino IDE 1.8.57.0

Operating System

Windows10

Flash frequency

80MHz

PSRAM enabled

no

Upload speed

115200

Description

ESP32 scans the available BLE tags and publishing the collected information over mqtt (Raspberry Pi 4 B is the broker). It was working fine in the last month. now it started to print error in the console something like [E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes" and after some time it restarts. BLE scan runs in a freeRTOS task. MQTT publishing is in loop().

it works fine for around 5min(time interval is random) then fails and reboot as the log mentioned.

[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes" [E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 113, "Software caused connection abort"

Sketch

will provide if requested.

Debug Message

[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 113, "Software caused connection abort"
PUB Result: 0
number of uploads: 153

ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5828
entry 0x400806a8

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

MomePP commented 2 years ago

I also got this error but in my case, it's a Wi-Fi AP as webserver by using WiFiManager. After started webserver, ESP writing response to client. This error showing up then crash.

This only happended with latest release(2.0.2). Downgrade to 2.0.1, It works fine.

BeaverUI commented 2 years ago

I have exactly the same issue with my code. I use MQTT, BLE, and WiFi together on an ESP32. It runs for a while, and then resets at a quasi-random time.

@MomePP: I didn't get your comment, I believe the WiFiClient is in the board definitions file, and no version 2.0.2 there. Or am I missing something?

sudheeshsmadhav commented 2 years ago

I didn't use the WiFiManager. h

include

include

include

include

include

include

include

include

include

I use only these files in my code

BeaverUI commented 2 years ago

I have these:

include

include "BLEDevice.h"

include

include

include

include

Different MQTT library, by the way.

MomePP commented 2 years ago

@BeaverUI

Sorry for the confusion. What I mean is version of ESP32 Arduino Core that i used and WiFiClient is part of definition file in WiFi library.

In my case i used WiFiManager library, which is based from WiFi library. So i got the same issue with WiFiClient. However, I didn't use BLE.

BeaverUI commented 2 years ago

Thanks, I changed to 2.0.1 now, let's see what happens. It usually takes less than 24 hours before it throws the error.

BeaverUI commented 2 years ago

Oops, it did it again. Changing the Arduino Core made no difference. What I notice is that some processes seem to struggle:

sudheeshsmadhav commented 2 years ago

Hi, changed some library files and their versions. Still the issue available. Anybody identified any possible solutions? I reinstalled the Arduino ide, tried different versions of library files. No use... It is still makes me mad.

sudheeshsmadhav commented 2 years ago

Not a permanent solution but some how I am able to handle it. I am repeatedly sending data from esp to server, So this problem made some gap in the communication. As the error printed - [E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes" , I am able to point out the error code in the wificlient.cpp file. Upon printing the log, I made the ESP to restart. So ESP will get in to the active loop within few seconds rather than printing the error log in console for more than 10 seconds. But looking for a permanent solution.

sudheeshsmadhav commented 2 years ago

any solution for this. ?

VojtechBartoska commented 2 years ago

Hello all contributors, can you please retest this on v2.0.3-rc1?

VojtechBartoska commented 2 years ago

I'm closing the issue as expired due to no answer.

If needed, please reopen it.

Thanks for understanding.

maxdd commented 2 years ago

I do have a similar issue with chunked data (nothing related with BLE which i dont know what it is mentioned in this issue)

[  4906][D][HTTPClient.cpp:598] sendRequest(): request type: 'GET' redirCount: 0

[  4957][D][HTTPClient.cpp:1156] connect():  connected to www.meteoproject.it:80
[  5047][D][HTTPClient.cpp:1307] handleHeaderResponse(): code: 200
[  5047][D][HTTPClient.cpp:1314] handleHeaderResponse(): Transfer-Encoding: chunked
[  5049][D][HTTPClient.cpp:628] sendRequest(): sendRequest code=200

[  5056][D][HTTPClient.cpp:922] writeToStream():  read chunk len: 16780
[  5159][D][HTTPClient.cpp:1446] writeToStreamDataBlock(): connection closed or file end (written: 16780).
[  5160][D][HTTPClient.cpp:922] writeToStream():  read chunk len: 0
[  5164][D][HTTPClient.cpp:388] disconnect(): still data in buffer (2), clean up.

[  5171][E][WiFiClient.cpp:516] flush(): fail on fd 50, errno: 11, "No more processes"
.[  5181][D][HTTPClient.cpp:393] disconnect(): tcp keep open for reuse

Seems like after the trailing sequence some data is still present in the buffer

char b[2] = {0,0};
_client->readBytes((uint8_t*)b, 2);
Serial.printf("char 1 - %x - ", b[0]);
Serial.printf("char 2 - %x -\n", b[1]);
[  3686][D][HTTPClient.cpp:922] writeToStream():  read chunk len: 16783
[  3773][D][HTTPClient.cpp:1450] writeToStreamDataBlock(): connection closed or file end (written: 16783).
char 1 - 30 - char 2 - d -
[  3784][W][HTTPClient.cpp:1473] returnError(): error(-11): read Timeout
[  3784][D][HTTPClient.cpp:1475] returnError(): tcp stop

Is it a server-side issue?

ln-12 commented 2 years ago

Hey @VojtechBartoska, I am using 2.0.3 (stable) and experiencing the same issue.

VojtechBartoska commented 2 years ago

I'm reopening the issue and we will take a look on it.

sudheeshsmadhav commented 2 years ago

The issue is still existing and I am restarting the esp to overcome this quickly. Looking for a permanent solution.

Using able scan, collection the Bluetooth tag id and their rssi value, publishing it using mqtt. A ble scan runs in freertos task.

sudheeshsmadhav commented 2 years ago

The issue still existing. I am just restarting the esp32 upon getting this error currently. Looking for a permanent solution. Loosing some data during the esp restart.

Scanning the ble beacons in free rtos task, publishing to server using mqtt. Using pubsubclient library.

Esp restarts/causing this problem randomly. Say after 5min..

VojtechBartoska commented 2 years ago

Adding it into "To-DO" on our Roadmap.

ln-12 commented 2 years ago

I don't know if it helps, but for me the message is showing up right before the connection drops. After the connection is lost, I reconnect and it works fine.

MartinVerges commented 2 years ago

I had to downgrade to 2.0.1 to get my Web OTA Firmware Update running again. Please fix this or simply roll back the old version out of 2.0.1 without the problem.

VojtechBartoska commented 2 years ago

@PilnyTomas can you please test this? Thanks!

lijianyu1985 commented 2 years ago

image same issue

PilnyTomas commented 2 years ago

image

ln-12 commented 2 years ago

image

I know that you can't really debug this issue without a sample to reproduce, but (at least for me) the issue does not occur consistently. Sometimes my connection is fine for days, sometimes this error appears after hours. My project is quite large, so I reduced it to the main parts which handle the wifi connection. I am using FreeRTOS and start multiple tasks of which one is defined like this:

void TaskCommunication(void *pvParameters) {   
    // some other initialization...

    for (;;) {
        if(enableWifi) {
            if (WiFi.status() != WL_CONNECTED) {
                // disconnect and shutdown wifi to start fresh
                WiFi.disconnect(true, true);

                vTaskDelay(100);

                // initialize wifi (but don't connect yet)
                WiFi.begin(config::ssid, config::password, 0, nullptr, false);

                // some more debug output and initialization...

                // no wifi connection, try to connect
                WiFi.begin(config::ssid, config::password);

                // Wait 10s to connect to wifi
                waitForConnectResult(5000);

                if (WiFi.status() != WL_CONNECTED) {
                    WiFi.disconnect(true, true);

                    // wait 5s before we try again to save some energy
                    vTaskDelay(5000);

                    continue;
                }

                // start websocket connection using <ArduinoWebsockets.h> and <ArduinoJson.h>...
            } else {
                // poll websocket data...
            }
        } else {
            // wifi scanning should be turned off
            if(WiFi.getMode() != WIFI_OFF) {
                // disconnect from AP
                WiFi.disconnect(true, true);  

                // give the ESP some time to finish disconnecting
                vTaskDelay(100); 

                // turn the Wifi module off
                WiFi.mode(WIFI_OFF);
            }

            // wifi if off, so we just wait
            vTaskDelay(500);
        } 
    }
}

There is really nothing special going on. One thing I noticed was that the ESP seems to have trouble with a lot of interference from many wifi networks around. When I am at my office (< 5 wifi networks around), it connects immediately, but when I am at our lab (> 10 wifi networks with strong RSSIs), the connection needs more time to establish. Maybe this helps you a bit...

garageeks commented 2 years ago

Hi, I'm on 2.0.4 and I just noticed this error as well, however I get the HTTP 200 message so the transmission is succesful after all. I have plenty of free heap memory, around 213KB.

Sending command 0 to IP 192.168.1.6...[271294][E][WiFiClient.cpp:516] flush(): fail on fd 49, errno: 11, "No more processes"
OK
Sending command 0 to IP 192.168.1.7...[273039][E][WiFiClient.cpp:516] flush(): fail on fd 49, errno: 11, "No more processes"
OK
Sending command 0 to IP 192.168.1.8...[274557][E][WiFiClient.cpp:516] flush(): fail on fd 49, errno: 11, "No more processes"
OK
Sending command 0 to IP 192.168.1.9...[276194][E][WiFiClient.cpp:516] flush(): fail on fd 49, errno: 11, "No more processes"
OK

I'm using a standard HTTPClient workflow:

HTTPClient http;
char hostName[80] = {0};
strcat(hostName, "http://192.168.1.");
appendIntValue(hostName,lastIPvalue);
strcat(hostName, "/cmd?user=user");

// Your Domain name with URL path or IP address with path
http.begin(hostName);
http.setConnectTimeout(5000);
// Send HTTP GET request
int httpResponseCode = http.GET();
if (httpResponseCode > 0) {
    if(httpResponseCode == HTTP_CODE_OK) {
        out.println("OK");
    }
}

I also noticed the same error in this other issue here

garageeks commented 2 years ago

Here some more details with DCORE_DEBUG_LEVEL=5

[121029][V][HTTPClient.cpp:252] beginInternal(): url: http://192.168.1.6/cmd?user=user
[121030][D][HTTPClient.cpp:303] beginInternal(): protocol: http, host: 192.168.1.6 port: 80 url: /cmd?user=user
[121042][D][HTTPClient.cpp:598] sendRequest(): request type: 'GET' redirCount: 0

[121090][D][HTTPClient.cpp:1156] connect():  connected to 192.168.1.6:80
[121125][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'HTTP/1.1 200 OK'
[121126][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'Content-Type: application/json'
[121130][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'Server: ESP (ESP8266EX)'
[121138][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'Cache-Control: no-cache, no-store, must-revalidate'
[121148][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'Pragma: no-cache'
[121155][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'Expires: -1'
[121162][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'Accept-Ranges: none'
[121169][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'Transfer-Encoding: chunked'
[121177][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: 'Connection: close'
[121184][V][HTTPClient.cpp:1250] handleHeaderResponse(): RX: ''
[121190][D][HTTPClient.cpp:1307] handleHeaderResponse(): code: 200
[121195][D][HTTPClient.cpp:1314] handleHeaderResponse(): Transfer-Encoding: chunked
[121203][D][HTTPClient.cpp:628] sendRequest(): sendRequest code=200

[121222][D][HTTPClient.cpp:922] writeToStream():  read chunk len: 233
[121223][D][HTTPClient.cpp:1446] writeToStreamDataBlock(): connection closed or file end (written: 233).
[121227][D][HTTPClient.cpp:922] writeToStream():  read chunk len: 0
[121233][D][HTTPClient.cpp:388] disconnect(): still data in buffer (2), clean up.

[121240][E][WiFiClient.cpp:516] flush(): fail on fd 49, errno: 11, "No more processes"
[121250][D][HTTPClient.cpp:395] disconnect(): tcp stop
OK
[121255][D][HTTPClient.cpp:408] disconnect(): tcp is closed
ccrawford commented 2 years ago

This one has been causing me headaches so looked into it more deeply.

errno 11 (EAGAIN) happens when you go to read (via BSD recv function) bytes from a stream with MSG_DONTWAIT flag and there are no bytes to be read. The ESP error string "No more processes" is misleading. It's really "No data available, try again later." or maybe "No data to process".

The problem as I see it is WiFiClient::flush() determines how many bytes to flush by calling available(), but that returns the number of bytes in the buffer plus the number of bytes in the stream. If the stream is empty (in my test case), but the buffer is not, the subsequent call in flush() to "recv(...)" returns the EAGAIN result (because no bytes in the stream to fetch.)

Multiple ways to fix this:

  1. Ignore the EAGAIN error result and keep going. Add: if(errno == EAGAIN) res = a; after the recv call. This feels hacky.
  2. Figure out the bytes to flush based on the number of bytes in the receive stream, not the buffered stream. I like this, but I don't see an obvious way to get that value short of making WiFiClientRxBuffer::r_available() public (inversion of control?) so no.
  3. Rather than using the recv function to clear the buffer/stream, use the read function instead. Has the potential to be slower if we have to read a bunch of data into a buffer we're never going to use, but feels like the right solution to me.

So, in WiFiClient.cpp line 515 I would replace res = recv(fd(), buf, toRead, MSG_DONTWAIT);
with: res = read(buf, a);

But, I'm very new to this code, so have not put in a PR for this as yet.

vlastahajek commented 2 years ago

I have a similar issue with classic ESP32 (ESP32-D0WDQ6). When POSTing over HTTP a large amount of data (~143,000 bytes) using the streaming method, it fails:

[  4963][D][HTTPClient.cpp:303] beginInternal(): protocol: http, host: 192.168.1.103 port: 999 url: <url>
[  4978][D][HTTPClient.cpp:1156] connect():  connected to 192.168.1.103:999
[  4979][D][WiFiClient.cpp:393] write(): should write 293 <-- header
[  4982][D][WiFiClient.cpp:393] write(): should write 1460 <-- streaming read from buffer
[  4986][D][WiFiClient.cpp:393] write(): should write 1460 <-- streaming read from buffer
[  4990][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[  5832][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[  6832][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[  7832][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[  8832][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[  9832][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[ 10832][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[ 11832][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[ 12832][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 11, "No more processes"
[ 13487][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[ 13487][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 200 - BEACON_TIMEOUT
[ 13490][D][WiFiGeneric.cpp:966] _eventCallback(): WiFi Reconnect Running
[ 13498][E][WiFiClient.cpp:423] write(): fail on fd 48, errno: 113, "Software caused connection abort"
[ 13510][D][HTTPClient.cpp:776] sendRequest(): short write, asked for 1460 but got 0 retry...
[ 13515][D][HTTPClient.cpp:797] sendRequest(): short write, asked for 1460 but got 0 failed.
[ 13522][W][HTTPClient.cpp:1469] returnError(): error(-3): send payload failed
[ 13529][D][HTTPClient.cpp:408] disconnect(): tcp is closed
[ 13545][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 4 - STA_CONNECTED
[ 13706][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 7 - STA_GOT_IP
[ 13706][D][WiFiGeneric.cpp:991] _eventCallback(): STA IP: 192.168.1.194, MASK: 255.255.255.0, GW: 192.168.1.20

As you can see, WiFi becomes unstable and even reconnects. This happens on 2.0.5 and 2.0.4. It works well on 2.0.3.

When I decrease the buffer size to e.g. 107,520 bytes, it works ok.

In my case, it seems to be related to a memory issue:

me-no-dev commented 2 years ago

@vlastahajek looks like you are running out of memory

vlastahajek commented 2 years ago

@me-no-dev, yes, but with more than 50k free memory ESP32 becomes unstable? 50k looks like enough free space

me-no-dev commented 2 years ago

you are probably measuring that free memory once the connection is closed (or has not yet started). The reality when the connection is active might be different.

vlastahajek commented 2 years ago

Yes, exactly, before starting HTTP POST. But, my surprise is that 50K is not a safe memory margin for ESP32. E.g. ESP8266 works ok even with 10K free RAM.

me-no-dev commented 2 years ago

They use different SSL stacks. That might be the reason. Overall I agree that 50k ought to be enough

vlastahajek commented 2 years ago

I didn't use HTTPS, jut HTTP, in my case.

me-no-dev commented 2 years ago

Then next guess would be that the max size of allocatable chunk is not large enough, due to memory fragmentation and such. How do you get max_alloc_heap?

ccrawford commented 2 years ago

The error message indicates that the receiving socket can't accept the message (that the send() call would block if it tries to send, and the send() call uses the no-wait flag. My guess is you are trying to push more information in a single call than the send buffer has room to store before sending. Since the call would block while the first chunk is sent before it would send the rest, you get that return code EAGAIN.

I'm unsure if there is a function to increase the size of the send buffer, or if there is a library that supports flexible buffer sizes without blocking but maybe this gets you pointed in a better direction.

vlastahajek commented 2 years ago

Then next guess would be that the max size of allocatable chunk is not large enough, due to memory fragmentation and such. How do you get max_alloc_heap?

I use getters of ESP object: ESP.getMaxAllocHeap().

I added a debug print to my stream provider. HTTPClient reads two batches, before it fails. Memory is just slightly decreased:

37.442 BatchStream::readBytes 1460, free_heap 49692, max_alloc_heap 44020
37.445 BatchStream::readBytes 1460, free_heap 46528, max_alloc_heap 44020
ln-12 commented 2 years ago

I can confirm that I see the exact same behaviour as @vlastahajek. Free heap and max alloc heap show plenty of space available, but the connection fails. As shown above, even the minimal example fails after some time.

@vlastahajek do you also use FreeRTOS?

vlastahajek commented 2 years ago

@ln12, I use this ESP32 Arduino Core, which is written above Espressif ESP-IDF, which is built above FreeRTOS.

ln-12 commented 2 years ago

If I would have to guess, I would say that the problem is located in the FreeRTOS part of the sdk. I did not have any issues when I did not use it for other projects. Can anyone confirm this?

garageeks commented 2 years ago

In my tests I have noticed that when free heap memory goes below 70KB, HTTP connection fails and with a much smaller buffer than yours @vlastahajek . So 50KB is definitely too low, but it would be worth a try to understand why so much memory is needed in the first place.

andychess commented 2 years ago

Hi, I am also also experiencing the ‚No more resources error‘ originating from WiFiClientSecure.cpp. I have a small app sending pin states and receiving change commands via GET and POST every 2 secs. Everything was fine for a few hours and then crashed.

vlastahajek commented 2 years ago

I did a couple of tests with my testing app with different core versions.

It seems that free ~50,000 bytes is the limit for stable WiFi communication for 2.0.5. This happens when I allocate 136,500 bytes and send data using HTTP using a streaming wrapper.

When a slightly bigger buffer is sent, e.g. 140,000 bytes:

This seems that runtime memory consumption increases among versions. Free memory at the same point after allocating buffer: version Free heap Max Alloc Block
1.0.6 90,368 68,680
2.0.3 74,468 44,020
2.0.5 53,928 44,020

In the case of ESP8266 in a similar situation with 2.7.4, it communicates even only 5,200 bytes are free! ESP8266 runtime is less hungry in 3.0.2, where there are 16,064 bytes free in the same situation.

andychess commented 2 years ago

I forgot to mention that I only experienced the issue on the ESP32. I also have two ESP8266 running the same code without problems . I was also able to isolate the exception to the line making a call to http.POST().

ln-12 commented 2 years ago

We can see the exact same behaviour as @vlastahajek in our project. In our test, we are running ESP NOW instead of Wifi together with FreeRTOS. After some observation, the wireless communication fails and/or the ESP freezes up between 55kb and 65kb of free heap memory, so we can not make out a specific threshold. Max alloc heap was in both cases > 40kb.

Is someone from the espressif team looking into this issue? We really need a solution in the near future or have to switch to another microcontroller.

VojtechBartoska commented 2 years ago

thanks for the feedback, we will investigate this issue as soon as possible. We are not able to include the investigation of it under the next release.

Knight13th commented 2 years ago

+1 on this issue with HTTPClient.Get(). 44K+ is a ton of memory when you only have so little to start with.

sudheeshsmadhav commented 2 years ago

Simply restarting the ESP32 to get out of this.

primus192 commented 1 year ago

I am facing the same problem when downloading file from sd card via wifi on my captive portal. On arduino 1.0.6 it works perfectly.

davealam commented 1 year ago

Also seeing this issue with HTTPClient.GET() though it doesn't seem to have any effects outside of displaying a warning