espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.72k stars 7.43k forks source link

ESP32 BLE/WiFi throws an error [E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes" - still not resolved #6129

Closed sudheeshsmadhav closed 1 year ago

sudheeshsmadhav commented 2 years ago

Board

ESP32 devkit v4

Device Description

I am using ESP32 devkit v4

Hardware Configuration

no GPIO connections, ESP32 scans the BLE tags and sends their id and rssi values over mqtt

Version

latest master

IDE Name

Platform IO also uses arduino IDE 1.8.57.0

Operating System

Windows10

Flash frequency

80MHz

PSRAM enabled

no

Upload speed

115200

Description

ESP32 scans the available BLE tags and publishing the collected information over mqtt (Raspberry Pi 4 B is the broker). It was working fine in the last month. now it started to print error in the console something like [E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes" and after some time it restarts. BLE scan runs in a freeRTOS task. MQTT publishing is in loop().

it works fine for around 5min(time interval is random) then fails and reboot as the log mentioned.

[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes" [E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 113, "Software caused connection abort"

Sketch

will provide if requested.

Debug Message

[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 11, "No more processes"
[E][WiFiClient.cpp:395] write(): fail on fd 54, errno: 113, "Software caused connection abort"
PUB Result: 0
number of uploads: 153

ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5828
entry 0x400806a8

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

mister-byt commented 1 year ago

I have the same problem. Every 5-6 seconds ESP32 polling server or send some information with http POST method in RTOS task. In first 24-48 hours all work good, but after this time I have errors:

09:54:22.140 -> [140102879][E][WiFiClient.cpp:422] write(): fail on fd 48, errno: 11, "No more processes"
09:54:23.144 -> [140103879][E][WiFiClient.cpp:422] write(): fail on fd 48, errno: 11, "No more processes"
09:54:24.146 -> [140104879][E][WiFiClient.cpp:422] write(): fail on fd 48, errno: 11, "No more processes"
09:54:25.146 -> [140105879][E][WiFiClient.cpp:422] write(): fail on fd 48, errno: 11, "No more processes"
09:54:26.136 -> [140106879][E][WiFiClient.cpp:422] write(): fail on fd 48, errno: 11, "No more processes"
09:54:27.144 -> [140107879][E][WiFiClient.cpp:422] write(): fail on fd 48, errno: 11, "No more processes"
09:54:28.147 -> [140108879][E][WiFiClient.cpp:422] write(): fail on fd 48, errno: 11, "No more processes"
09:54:28.833 -> [140109596][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 200 - BEACON_TIMEOUT
09:54:28.866 -> [140109597][E][WiFiClient.cpp:422] write(): fail on fd 48, errno: 113, "Software caused connection abort"
09:54:28.866 -> [140109602][W][HTTPClient.cpp:1469] returnError(): error(-2): send header failed

Or this errors:

09:52:10.926 -> [139971660][E][WiFiClient.cpp:67] fillBuffer(): Not enough memory to allocate buffer
09:52:10.926 -> [139971661][E][WiFiClient.cpp:467] read(): fail on fd 48, errno: 11, "No more processes"

Sometimes request recieved to server but the more time passes, the less often requests reach the server and eventually they stop reaching at all. After that I have this httpResponseCode:

httpResponseCode: -1

And after more time I have one more string in logs:

14:38:43.682 -> [247777735][E][WiFiGeneric.cpp:1476] hostByName(): DNS Failed for example.com

Part of my code, which tied with WifiClient and http requests:

DynamicJsonDocument docWiFi(10240);
void WiFiTask(void *pvParameters) {
  for (;;) {
     if (xQueueReceive(queueWiFi, &sendWiFi, portMAX_DELAY) == pdPASS) {
         xSemaphoreTake(xMutex, portMAX_DELAY);

         docWiFi.clear();

         if(WiFi.status() == WL_CONNECTED ) {
              HTTPClient http;
              http.setReuse(false);  
              http.begin(wifiClient, hostTcp, port, httpRequest.c_str());
              //http.begin(httpRequest.c_str()); // I have already tried this 
              http.addHeader("Content-Type", "application/json");
              http.addHeader("Authorization", accessToken.c_str());
              httpResponseCode = http.POST(info);

             if (httpResponseCode == 200) {
                deserializeJson(docWiFi, http.getStream());
              }

             http.end();  
             wifiClient.flush();  // I have tried without this
             wifiClient.stop();  // And this
             xQueueSend(queueWiFi, &sendWiFi, portMAX_DELAY);
         }
     }
     vTaskDelay(2000);
  }
}

If I reset ESP32 everything repeats again

Maybe exist some temporary resolves for this which can help? For example, clean programmatically some buffers or memory, which will repeat reset process. I have tried wifiClient flush() to clean resources, but it had not helped.

36in36 commented 1 year ago

I don't have much experience with ESP32, so if this is overly simplistic, I apologize.

I was receiving the "fail on fd 48, errno: 11, "No more processes"". I noticed the http returning a 301.

A change was made on the server to 'convert' everything to https. Changing my esp code to https solved the issue for me.

sudheeshsmadhav commented 1 year ago

I am still restarting the ESP32 upon hitting on this error. Using mqtt over local network. @36in36 , if you solved it using https, at this point, i can say, you are lucky.

marmiha commented 1 year ago

@sudheeshsmadhav same issue here.

zotanmew commented 1 year ago

This one has been causing me headaches so looked into it more deeply.

errno 11 (EAGAIN) happens when you go to read (via BSD recv function) bytes from a stream with MSG_DONTWAIT flag and there are no bytes to be read. The ESP error string "No more processes" is misleading. It's really "No data available, try again later." or maybe "No data to process".

The problem as I see it is WiFiClient::flush() determines how many bytes to flush by calling available(), but that returns the number of bytes in the buffer plus the number of bytes in the stream. If the stream is empty (in my test case), but the buffer is not, the subsequent call in flush() to "recv(...)" returns the EAGAIN result (because no bytes in the stream to fetch.)

Multiple ways to fix this:

  1. Ignore the EAGAIN error result and keep going. Add: if(errno == EAGAIN) res = a; after the recv call. This feels hacky.
  2. Figure out the bytes to flush based on the number of bytes in the receive stream, not the buffered stream. I like this, but I don't see an obvious way to get that value short of making WiFiClientRxBuffer::r_available() public (inversion of control?) so no.
  3. Rather than using the recv function to clear the buffer/stream, use the read function instead. Has the potential to be slower if we have to read a bunch of data into a buffer we're never going to use, but feels like the right solution to me.

So, in WiFiClient.cpp line 515 I would replace res = recv(fd(), buf, toRead, MSG_DONTWAIT); with: res = read(buf, a);

But, I'm very new to this code, so have not put in a PR for this as yet.

@ccrawford Thank you so much for this patch! I managed to override the affected method in the latest release, maybe this is helpful to someone else here:

File wifiFix.h:

#pragma once

class WiFiClientFixed : public WiFiClient {
public:
    void flush() override;
};

File wifiFix.cpp:

#include <WiFi.h>
#include <lwip/sockets.h>
#include "wifiFix.h"

#define WIFI_CLIENT_FLUSH_BUFFER_SIZE    (1024)

void WiFiClientFixed::flush() {
    int res;
    size_t a = available(), toRead = 0;
    if (!a) {
        return;//nothing to flush
    }
    auto *buf = (uint8_t *) malloc(WIFI_CLIENT_FLUSH_BUFFER_SIZE);
    if (!buf) {
        return;//memory error
    }
    while (a) {
        // override broken WiFiClient flush method, ref https://github.com/espressif/arduino-esp32/issues/6129#issuecomment-1237417915
        res = read(buf, min(a, (size_t)WIFI_CLIENT_FLUSH_BUFFER_SIZE));
        if (res < 0) {
            log_e("fail on fd %d, errno: %d, \"%s\"", fd(), errno, strerror(errno));
            stop();
            break;
        }
        a -= res;
    }
    free(buf);
}

File main.cpp:

#include "wifiFix.h"
WiFiClient* wifi = new WiFiClientFixed();

Finally, instead of http.begin(url);, use http.begin(*wifi, url);

I hope this helps someone out!

zotanmew commented 1 year ago

Update to the above: there was a slight bug in there which caused memory issues, it's now fixed.

NathanSweet commented 1 year ago

Sending 1 byte over ethernet every 6 seconds, I get a crash after a few minutes:

[2619154][E][WiFiClient.cpp:422] write(): fail on fd 49, errno: 113, "Software caused connection abort"

I can't tell if this is the same error in this issue?

My environment:

Used library Version Path
WiFi         2.0.0   C:\Users\username\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.5\libraries\WiFi
Ethernet     2.0.0   C:\Users\username\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.5\libraries\Ethernet
Update       2.0.0   C:\Users\username\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.5\libraries\Update
swicago              C:\projects\esp32\include\swicago

Used platform Version Path
esp32:esp32   2.0.5   C:\Users\username\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.5
dizcza commented 1 year ago

Update to the above: there was a slight bug in there which caused memory issues, it's now fixed.

Is it fixed in the latest 2.0.7 release version?

I'm using the latest platformio espressif32 though I doubt it's on the PlatformIO side.

Connecting to WiFi
WiFi Connected
Started up
[  1510][E][WiFiClient.cpp:517] flush(): fail on fd 48, errno: 11, "No more processes"
[  2506][E][WiFiClient.cpp:517] flush(): fail on fd 48, errno: 11, "No more processes"
[  3587][E][WiFiClient.cpp:517] flush(): fail on fd 48, errno: 11, "No more processes"

In case you were about to ask which platformio version I was using,

So the issue is there for at least half a year.

jimemo commented 1 year ago

I am also seeing this error but only recently. I'm using PIO. Happening on ARDUINO_ARCH_ESP32 but not on ARDUINO_ARCH_ESP8266.

Starting connection to server: xxxx/postOutputData/
Post text:
**(removed - is 355 bytes)  **
**[  5945][E][WiFiClient.cpp:517] flush(): fail on fd 48, errno: 11, "No more processes"**
Response code: 200
Sending POST success.
Response text:
**(removed - is 457 bytes)  **
Device RTC clock set: 15/03/23 13:17:59
SSID: ESP-ROM:esp32c3-api1-20210207
Build:Feb  7 2021
rst:0x3 (RTC_SW_SYS_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x403819b6
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fcd5810,len:0x438
load:0x403cc710,len:0x918
load:0x403ce710,len:0x24e4
entry 0x403cc710

This line is where the device resets itself after the error (I have not done this via code): SSID: ESP-ROM:esp32c3-api1-20210207 The program is able to performing additional functions after successfully receiving the response before the reset happens. I guess there is a delay before resetting somewhere in WifiClient.h. Here is my platformio.ini:

[env:seeed_xiao_esp32c3]
platform = espressif32
board = seeed_xiao_esp32c3
framework = arduino

As suggested above I can't downgrade WifiClient.h to 2.0.1 easily because it's installed by the platform. I tried this to see if it would use the older version but no luck:

lib_deps = 
    WifiClient@^2.0.1
dizcza commented 1 year ago

@jimemo I downgraded the whole platformio with the arduino package like so:

platform = espressif32@3.4.0
jimemo commented 1 year ago

Many thanks @dizcza I couldn't use that version of espressif32 with this newer board: board = seeed_xiao_esp32c3 however I had been getting the same error on this older board, which is now working since your suggestion: board = esp32dev Thanks again.

dizcza commented 1 year ago

Many thanks @dizcza I couldn't use that version of espressif32 with this newer board: board = seeed_xiao_esp32c3

If you know the pinout for a particular board (and if not, just look at the datasheet), you can always use a generic board (I don't know but I'd try esp32c3) as a replacement. It works in 90% of cases. In this case, you may need to manually specify the pinout if doesn't work out-of-the-box for you, f.i.

Wire.begin(sda, scl)

instead of default

Wire.begin()
felipe-uf commented 1 year ago

This works for me too. For now im getting no erros

LNqueen commented 1 year ago

I have the same problem here, and the only difference is that the chip does not restart, but the wifi will disconnect from time to time.

thorrak commented 1 year ago

Has anyone submitted a PR with the changes suggested by @ccrawford in https://github.com/zotanmew#issuecomment-1237417915 (or @zotanmew in https://github.com/zotanmew#issuecomment-1418051304 )?

I've been using @zotanmew 's patch which seems to be pretty stable -- Would this be something that could be incorporated into the next bugfix release @VojtechBartoska (or alternatively the next major release)?

primus192 commented 1 year ago

Any news on this?

VojtechBartoska commented 1 year ago

@PilnyTomas can you please investigated this again? Thanks

@primus192 No updates on this so far, thanks for reminder

sudheeshsmadhav commented 1 year ago

After updating the Libraries and IDE versions, the frequency of restarting is less than previous. But still the restarting issue is there.

PilnyTomas commented 1 year ago

@PilnyTomas can you please investigated this again? Thanks

I cannot, all the info that is in this thread basically says "I have a problem" noone has yet provided a complete and minimal sketch with sufficient information to reproduce the issue so that I could observe it when failing. Without this, it is pointless waiting for a solution.

OptifySudarshanPatil commented 1 year ago

i'm getting same prompt with this code. Good news data is received on server with "200" code

#include "Arduino.h"
#include <WiFi.h>
#include <HTTPClient.h>

const char* ssid = "device";
const char* password = "1223334444";

//Your Domain name with URL path or IP address with path
const char* serverName = "http://sample.com/api/";

// the following variables are unsigned longs because the time, measured in
// milliseconds, will quickly become a bigger number than can be stored in an int.
unsigned long lastTime = 0;
// Timer set to 10 minutes (600000)
//unsigned long timerDelay = 600000;
// Set timer to 5 seconds (5000)
unsigned long timerDelay = 5000;

void setup() {
  Serial.begin(115200);

  WiFi.begin(ssid, password);
  Serial.println("Connecting");
  while(WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("");
  Serial.print("Connected to WiFi network with IP Address: ");
  Serial.println(WiFi.localIP());

  Serial.println("Timer set to 5 seconds (timerDelay variable), it will take 5 seconds before publishing the first reading.");
}

void loop() {
  //Send an HTTP POST request every 10 minutes
  if ((millis() - lastTime) > timerDelay) {
    //Check WiFi connection status
    if(WiFi.status()== WL_CONNECTED){
      WiFiClient client;
      HTTPClient http;
      // Your Domain name with URL path or IP address with path
      http.begin(client, serverName);
      // If you need Node-RED/server authentication, insert user and password below
      //http.setAuthorization("REPLACE_WITH_SERVER_USERNAME", "REPLACE_WITH_SERVER_PASSWORD");
      // Specify content-type header
      http.addHeader("Content-Type", "application/x-www-form-urlencoded");
      // // Data to send with HTTP POST
      // String httpRequestData = "api_key=tPmAT5Ab3j7F9&sensor=BME280&value1=24.25&value2=49.54&value3=1005.14";           
      // // Send HTTP POST request
      // int httpResponseCode = http.POST(httpRequestData);
      // If you need an HTTP request with a content type: application/json, use the following:
      http.addHeader("Content-Type", "application/json");
      int httpResponseCode = http.POST("{\"Serial_no\":\"12345\",\"id\":\"2\",\"V1\":\"24.25\",\"V2\":\"49.54\",\"value3\":\"1005.14\"}");
      // If you need an HTTP request with a content type: text/plain
      //http.addHeader("Content-Type", "text/plain");
      //int httpResponseCode = http.POST("Hello, World!");
      Serial.print("HTTP Response code: ");
      Serial.println(httpResponseCode);
      // Free resources
      http.end();
    }
    else {
      Serial.println("WiFi Disconnected");
    }
    lastTime = millis();
  }
}

Terminal data image

PilnyTomas commented 1 year ago

Guys, please test PR #8541 and let me know if it works for you. I just hope it doesn't create a new issue :D

OptifySudarshanPatil commented 1 year ago

Hello @PilnyTomas Your changes are working as expected. Thanks! This issue is resolved for me. Hope this changes will show up into platformio esp32 core. ✌️ image

CelliesProjects commented 1 year ago

@PilnyTomas Works like a charm. Thanks for the effort!

Oekologisiert commented 1 year ago

I think i found a workraround until all packages are updated with espressif32@6.3.1 in platformio.ini The problem occures in my case only with espressif32@6.4

tablatronix commented 1 year ago

Same issue, also wifi drops out and fails to reconnect, and debugging of course also stopped logging sigh. Will isolate version

VojtechBartoska commented 1 year ago

will be fully addressed again in https://github.com/espressif/arduino-esp32/pull/8699

arturstopa commented 1 year ago

I get the same issuem I'm using WebSockets @ 2.4.1 and WiFi @ 2.0.0 PLATFORM: Espressif 32 (6.4.0) > Espressif ESP32 Dev Module

I'm pretty sure that webSocket.begin() is triggering the error [WiFiClient.cpp:517] flush(): fail on fd 48, errno: 11, "No more processes". CI of WebSockets library is failing, but the error itself is in the WiFi library, so I believe it's worth checking.

sudheeshsmadhav commented 9 months ago

Hi, Is this issue is fully addressed? and in which version? I am still getting the same error in espressif32@4.3.0 - Working with platformio. Also tried this with the following ; platformio/framework-arduinoespressif32@^3.20014.231204 ; platform = espressif32

but no luck, This error hits after 20+ minutes of run. After sometime ESP32 crashing and restarting. Please let me know if the fix is released / available in any one branch.. My hardware devices - more than 500 units affected with this.

@VojtechBartoska @PilnyTomas

VojtechBartoska commented 9 months ago

version 2.0.14 of Arduino core

bbhxwl commented 5 months ago

I also have the same problem.

tatulaiot commented 2 weeks ago

hi guys. any update on this issue? I am having the same trouble here, thanks a

boazf commented 4 days ago

This also happened to me. I noticed that also sometimes when I attempt to create a new task, the task creation fails. With error "not enough memory". I tried to reduce the size of the stacks of the tasks that I'm creating and it resolved both problems. New tasks are now always get created and this message do not appear anymore. You should of course not reduce the stack size too much so you'll not get stack overflow. Which is another annoying problem.

bbhxwl commented 4 days ago

How to completely solve this problem at the bottom level, apart from sending a small amount of data?

boazf commented 3 days ago

How to completely solve this problem at the bottom level, apart from sending a small amount of data?

I think this issue is caused when most of the memory is allocated and the system can't create new objects in order to perform its functions. Try to reduce your memory consumption. Like I did with reducing the size of tasks stack depth. You can check your task stack high watermark using uxTaskGetStackHighWaterMark() and see if you can reduce the stack depth of your tasks.