arduino / ArduinoCore-mbed

330 stars 195 forks source link

Every 5th TCP connection fails on Arduino Giga R1. #937

Open schnoberts1 opened 1 month ago

schnoberts1 commented 1 month ago

Platform: Giga R1 Arduino Core Mbed 4.1.5 HttpClient: 0.6.1

Here's a sequence of events with a crudely instrumented MbedClient. We connect using ArduinoHttpClient every 1s. Every 5th connection fails with error -3005 from static_cast<TCPSocket *>(sock)->open(getNetwork()); in the MbedClient::connect() call. When it fails it takes 30s to return. This is in version 4.1.5. In version 4.1.1 it would never recover and continue to fail until device reboot. There would be no 30s delay. This issue occurs on any http server I've tried. I can continue to connect to the server in the code with python on my Mac with no problems. If I extend the delay so I connect to the server once every 5s the problem goes away. At 4s it reappears.

22:10:12.229 -> Fetching content length from httpforever.com:80
22:10:12.393 -> Return code from connect0
22:10:13.213 -> took=993ms length: 5124
22:10:15.229 -> Fetching content length from httpforever.com:80
22:10:15.394 -> Return code from connect0
22:10:16.216 -> took=979ms length: 5124
22:10:18.198 -> Fetching content length from httpforever.com:80
22:10:18.365 -> Return code from connect0
22:10:19.225 -> took=1008ms length: 5124
22:10:21.203 -> Fetching content length from httpforever.com:80
22:10:21.366 -> Return code from connect0
22:10:22.222 -> took=1004ms length: 5124
22:10:24.202 -> Fetching content length from httpforever.com:80
22:10:24.202 -> Return code from TCPSocket::open-3005
22:10:54.295 -> took=30099ms length: -1

code

#include <Arduino.h>
#include <WiFi.h>
#include <HttpClient.h>

void setup() {
  Serial.begin(115200);
  delay(1000);
  Serial.println("HI");
  // put your setup code here, to run once:
  auto wifi = WiFiInterface::get_default_instance();
  wifi->set_dhcp(true);
  wifi->set_credentials("ZZZZ", "ZZZZ", nsapi_security_t::NSAPI_SECURITY_WPA2);
  Serial.println(wifi->connect());
}

void loop() {
  // put your main code here, to run repeatedly:
  WiFiClient client;

   HttpClient httpClient(client, "httpforever.com", 80);
   Serial.println("Fetching content length from httpforever.com:80");
   auto then = millis();
   httpClient.beginRequest();
   httpClient.get("/");
   httpClient.endRequest();
   auto len = httpClient.contentLength();
   Serial.print("took="); Serial.print((millis() - then)); Serial.print("ms "); Serial.print("length: "); Serial.println(len);
   delay(5000);
}
schnoberts1 commented 1 month ago

... at 4.5s delay it takes 8 calls before it fails, an interesting multiple of 4. Any hints? I tried moving WifiClient globally but that swaps -3005 errors for -3003 errors on the first re-use:

22:25:36.246 -> Fetching content length from httpforever.com:80
22:25:36.448 -> Return code from connect0
22:25:37.243 -> took=1001ms length: 5124
22:25:38.260 -> Fetching content length from httpforever.com:80
22:25:38.260 -> Return code from TCPSocket::open-3003
22:26:09.251 -> took=31000ms length: -1

31s timeout this time.

Since I've included the code this should replicate well.

schnoberts1 commented 1 month ago

... I am also guessing this 30s timeout is in HttpClient since there's no delay on the return from the failed connect.

schnoberts1 commented 1 month ago

Note: calling client.stop() does not resolve the issue

stopandchatchfire commented 2 weeks ago

The maximum number of TCP connections is 4.

As defined in the file: "mbed_config.h"

#define MBED_CONF_LWIP_TCP_SOCKET_MAX 4

I don't understand why it's so low. Need to recompile locally to have more connections available.

schnoberts1 commented 2 weeks ago

I never have 4 tcp sockets open but I am confident this limit is at the core of the issue. Some clean up isn’t being executed. Increasing the socket count isn’t going to help though as it just puts the issue off.

JAndrassy commented 2 weeks ago

@schnoberts1 can you try these changes? https://github.com/arduino/ArduinoCore-mbed/pull/912/files

schnoberts1 commented 2 weeks ago

Of course @JAndrassy. I've applied the patch and the issue still remains.