arduino-libraries / Ethernet

Ethernet Library for Arduino
http://arduino.cc/
257 stars 262 forks source link

Socket close function not working in 2.0.0 #82

Open fredlcore opened 5 years ago

fredlcore commented 5 years ago

In order to prevent frozen sockets to lockup network connections, this code here used to work until version 1.1.2 (taken from http://forum.arduino.cc/index.php?topic=291958.msg2040300#msg2040300):

#include <utility/w5100.h>
#include <utility/socket.h>

byte socketStat[MAX_SOCK_NUM];
unsigned long connectTime[MAX_SOCK_NUM];

void loop() {

  if(Serial.available()) {
    if(Serial.read() == 'r') ShowSockStatus();   
  }

  checkSockStatus();

  // rest of your loop code
}

// add these functions

void ShowSockStatus()
{
  for (int i = 0; i < MAX_SOCK_NUM; i++) {
    Serial.print(F("Socket#"));
    Serial.print(i);
    uint8_t s = W5100.readSnSR(i);
    socketStat[i] = s;
    Serial.print(F(":0x"));
    Serial.print(s,16);
    Serial.print(F(" "));
    Serial.print(W5100.readSnPORT(i));
    Serial.print(F(" D:"));
    uint8_t dip[4];
    W5100.readSnDIPR(i, dip);
    for (int j=0; j<4; j++) {
      Serial.print(dip[j],10);
      if (j<3) Serial.print(".");
    }
    Serial.print(F("("));
    Serial.print(W5100.readSnDPORT(i));
    Serial.println(F(")"));
  }
}

void checkSockStatus()
{
  unsigned long thisTime = millis();

  for (int i = 0; i < MAX_SOCK_NUM; i++) {
    uint8_t s = W5100.readSnSR(i);

    if((s == 0x17) || (s == 0x1C)) {
        if(thisTime - connectTime[i] > 30000UL) {
          Serial.print(F("\r\nSocket frozen: "));
          Serial.println(i);
          close(i);
        }
    }
    else connectTime[i] = thisTime;

    socketStat[i] = W5100.readSnSR(i);
  }
}

After adjusting the includes for 2.0.0, the "close(i)" still won't compile because the function is no longer provided for. When I take the 1.1.2 code of "close", the Arduino freezes whereas the same script with library 1.1.2 works perfectly fine.

Is there any way to fix this or at least work around it (without downgrading from version 2.0.0)?

Rotzbua commented 5 years ago

A change from 1.x.x to 2.x.x is a major update. It is normal that code work after upgrade. You have to look into the code. Maybe the function is just renamed.

For more information about versioning read: https://semver.org/

MAJOR version when you make incompatible API changes,

fredlcore commented 5 years ago

Obviously this is a major update and that's why I asked whether this function will be implemented in the foreseeable future.

I did check the code, the function has not (to my knowledge) just been renamed, but is not provided for. In 1.1.2 it was just a few lines of code and these don't work under 2.0.0, resulting in a freeze.

Closing a socket is not an everyday function, I agree, but also not one that is really extraordinary, so I won't be the only one missing it.

SapientHetero commented 5 years ago

Have you tried EthernetClass::socketClose() or EthernetClass::socketDisconnect()? Both are in socket.cpp. socketCDisconnect() closes the socket by sending the TCP peer a FIN packet and waiting for it to respond with an ACK packet. socketClose() immediately closes the socket without informing the peer..

fredlcore commented 5 years ago

Ah, I didn't realize the name change - it was just "close()" in version 1.1.2. However, it doesn't make a difference whether I use socketClose or socketDisconnect - both just consist of three lines of code:

SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
W5100.execCmdSn(s, Sock_CLOSE); // socketDisconnect uses Sock_DISCON instead
SPI.endTransaction();

When I remove the second line, the Arduino does not get stuck. If I let it in - either with Sock_CLOSE or Sock_DISCON, it crashes. As I said, with version 1.1.2 it works fine.

One other thing that I noticed: When I tell the Arduino to list all open sockets, I get eight possible sockets, although my W5100-based shield should only support 4, and this is also the number of entries I get when compiling with 1.1.2. I don't know if this is part of the problem, but the last four sockets always contain a non-existing IP number: Socket#0:0x16 80 D:192.168.1.34(55760) Socket#1:0x17 49153 D:192.168.1.5(2323) Socket#2:0x16 80 D:192.168.1.34(55760) Socket#3:0x14 80 D:192.168.1.20(33132) Socket#4:0x1 511 D:25.105.144.192(43009) Socket#5:0x1 511 D:25.105.144.192(43009) Socket#6:0x1 511 D:25.105.144.192(43009) Socket#7:0x1 511 D:25.105.144.192(43009)

What could be the reason that closing or disconnecting a given socket directly causes the program to hang in 2.0.0 and do as it should in 1.1.2?

SapientHetero commented 5 years ago

Interesting. When I first built for my W5500-based shield with this library, I got 4 sockets instead of the 8 it supports. I ended up hard-coding the number in Ethernet.h (see below).

// Configure the maximum number of sockets to support. W5100 chips can have // up to 4 sockets. W5200 & W5500 can have up to 8 sockets. Several bytes // of RAM are used for each socket. Reducing the maximum can save RAM, but // you are limited to fewer simultaneous connections. //#if defined(RAMEND) && defined(RAMSTART) && ((RAMEND - RAMSTART) <= 2048) //#define MAX_SOCK_NUM 4 //#else

define MAX_SOCK_NUM 8

//#endif

You could do the same, only set it to 4. Perhaps your issue is due to the library's attempt to communicate with registers associated with the 4 sockets that don't exist on your shield.

fredlcore commented 5 years ago

Ok, that might be pointing in the right direction, because EthernetClass::socketBegin has these lines right at the beginning: `#if MAX_SOCK_NUM > 4
if (chip == 51) maxindex = 4; // W5100 chip never supports more than 4 sockets

endif`

I just checked, my W5100 reports back as '51', and since MAX_SOCK_NUM is nevertheless '8', these lines of code seem to work around this. However, even if I limit my search for hanging sockets (the same way actually as socketBegin does it as a last resort in its closemakesocket section) to four, the closeing of any socket still leads to a freeze.

SapientHetero commented 5 years ago

In my experience, when my Arduino "freezes" it's "gone to Coventry", by which I mean the default interrupt handler, because a bug in my code has stomped something on the stack resulting in a function return to an invalid address.

Do you have a hardware debugger you can use to see what's going on? If not, add Serial.print(s); in the socketClose & socketDisconnect functions just before the execCmdSn (maybe followed by a Serial.flush() or a delay to give it time to print before the freeze) to see if you're accessing registers for sockets 4-7. Trying to write to non-existent addresses usually makes bad things happen.

fredlcore commented 5 years ago

I figured out what caused the "freeze": When you close a socket, W5100::execCmdSn is called. This function first sends the given command (here: SOCK_CLOSE or SOCK_DISCON) and then waits for the command to complete by doing a while (readSnCR(s)) ; As I said, I'm using this as a way to free up sockets in case all four sockets are in use. However, if all sockets somehow are still marked as 0x17 (ESTABLISHED), this while loop will never stop, even if no actual data is being transmitted. When I change the function by replacing the while loop with a short delay, the Arduino does not freeze. However, the connection also does not get closed. Does this mean there isn't a way to forcefully close a socket connection?

SapientHetero commented 5 years ago

That check for !W5100.SnCR() is waiting for the command register to be zeroed, which means the 5100 has started processing it. It's not your problem [you're mixing it up with SnSR(s)], though I did change it like this so the library gives the W5x00 done time to work between I/O operations:

do { delayMicroseconds(500); } while (!W5100.SnCR(s));

On Fri, Jan 18, 2019, 6:57 PM fredlcore notifications@github.com wrote:

I figured out what caused the "freeze": When you close a socket, W5100::execCmdSn is called. This function first sends the given command (here: SOCK_CLOSE or SOCK_DISCON) and then waits for the command to complete by doing a while (readSnCR(s)) ; As I said, I'm using this as a way to free up sockets in case all four sockets are in use. However, if all sockets somehow are still marked as 0x17 (ESTABLISHED), this while loop will never stop, even if no actual data is being transmitted. When I change the function by replacing the while loop with a short delay, the Arduino does not freeze. However, the connection also does not get closed. Does this mean there isn't a way to forcefully close a socket connection?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-455724983, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmII6GloAdAkeEJXB3hBcbaHJ99R8WBks5vEl9kgaJpZM4ZLUX4 .

fredlcore commented 5 years ago

Sorry, I'm confused, the function I'm talking about is this: void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete while (readSnCR(s)) ; } How and where should I apply your changes?

SapientHetero commented 5 years ago

I'm away from my office and working from memory. I meant readSnCR(s).

On Fri, Jan 18, 2019, 7:09 PM fredlcore notifications@github.com wrote:

Sorry, I'm confused, the function I'm talking about is this: void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete while (readSnCR(s)) ; } How and where should I apply your changes?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-455726828, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmIIw8cAZT0m6SX-c7EbFzpIzFuk8T_ks5vEmIxgaJpZM4ZLUX4 .

SapientHetero commented 5 years ago

Just add a delay between reads of SnCR

On Fri, Jan 18, 2019, 7:09 PM fredlcore notifications@github.com wrote:

Sorry, I'm confused, the function I'm talking about is this: void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete while (readSnCR(s)) ; } How and where should I apply your changes?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-455726828, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmIIw8cAZT0m6SX-c7EbFzpIzFuk8T_ks5vEmIxgaJpZM4ZLUX4 .

fredlcore commented 5 years ago

Oh, then I appreciate your efforts even more :)... I'll give it a try tomorrow, it's 1:30am here already, need to get some sleep ;)...

SapientHetero commented 5 years ago

Here you go:

void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete //while (readSnCR(s)) ; do { delayMicroseconds(100); } while (readSnCR(s)); }

I'm not sure this has anything to do with your problem, but it's a bad idea to flood a peripheral with I/O requests while waiting for it to execute a command. It resolved some EthernetClient.connect() issues I was experiencing; perhaps it will help with yours.

If you don't have a hardware debugger, I'd put Serial.print() statements in W5100.execCmdSn() and socketDisconnect() or socketClose() [whichever you're using] and try to determine where your code gets lost. I use both regularly with my W5500 shield and have no problems with them, so I'd focus your search in your code. If you want to post fragments that exhibit the problem I'll be happy to provide a 2nd set of eyes.

fredlcore commented 5 years ago

Thanks, this is how I did it, basically the same as you suggested, with two log outputs on serial added:

void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete // while (readSnCR(s)) ; do { Serial.println("Waiting..."); delayMicroseconds(100); } while (readSnCR(s)); Serial.println("Done."); }

Since this function is called from many others, I can see that it works normally (one or few times "Waiting..." and then "Done.") in all calls except the ones where I want to forcefully close a socket with this code if the socket is in status 0x17 (ESTABLISHED):

      SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
      W5100.execCmdSn(s, Sock_CLOSE);
      SPI.endTransaction();

Here it gets stuck in a seemingly endless loop printing "Waiting...". I don't understand exactly what readSnCR is doing with the socket or what exactly it is waiting for, but apparently readSnCR remains always true if a socket in status 0x17 is given a SOCK_CLOSE or SOCK_DISCON command.

Since it does seem to work if other commands are issued (other than SOCK_CLOSE on a 0x17-status socket, I'm not sure where and how to look further. And I can't find the code for readSnCR anywhere so I could compare Ethernet library version 1.1.2 with 2.0.0 in this respect, because it is working with the older library without a problem.

So if you have any other ideas where and how to address this issue, I'd be really grateful!

SapientHetero commented 5 years ago

Yeah, that puzzled me too when I first looked for it. Someone went far out of their way to write difficult-to-read code.

readSnCR is a macro invocation. The macro is in W5100.h:

#define __SOCKET_REGISTER8(name, address)                    \
  static inline void write##name(SOCKET _s, uint8_t _data) { \
    writeSn(_s, address, _data);                             \
}                                                          \
  static inline uint8_t read##name(SOCKET _s) {              \
    return readSn(_s, address);                              \
}

The Socket n Command Register (SnCR) is defined as follows, also in W5100.h:

 __SOCKET_REGISTER8(SnCR,        0x0001)        // Command

Together, these give us:

static inline void writeSnCR(SOCKET _s, uint8_t _data) { 
    writeSn(_s, address, _data);                             
}

and

static inline uint8_t readSnCR(SOCKET _s) {              
    return readSn(_s, address);                              
}

readSn is an inline function defined in W5100.h:

static inline uint8_t readSn(SOCKET s, uint16_t addr) {
    return read(CH_BASE() + s * CH_SIZE + addr);
  }

Since CH_BASE() returns 0x400 for the W5100 chip and CH_SIZE = 0x0100, readSnCR(s) resolves to:

read(0x400+s*0x100 + 0x0001);

You can find the read() function in W5100.cpp.

Thus,

W5100.execCmdSn(s, Sock_CLOSE); 

writes the Sock_CLOSE command, [0x10, defined in W5100.h] to the command register for the nth socket. When the W5100 sees this it automatically clears SnCR as part of processing the command.

The W5100 datasheet can be downloaded at https://www.wiznet.io/wp-content/uploads/wiznethome/Chip/W5100/Document/W5100_Datasheet_v1.2.7.pdf. Page 27 explains how Socket commands work, and section 3.2 (page 16) shows how registers are mapped to memory addresses. You may want to double-check addresses and command values found in W5100.h; I found a few minor discrepancies between the addresses for the W5500 and the defines in W5100.h that prevented setting connection timeouts.

fredlcore commented 5 years ago

Wow, thanks, this is really extensive and valuable information! I was thinking of first looking into potential differences between the way versions 1.1.2 and 2.0.0 address this matter, only to find out that now after testing it again with 1.1.2, it also gets stuck in this loop when an ESTABLISHED connection is to be CLOSEd, again after all the other calls to this function are completed without any problems. I'll have to see if I just omit closing these status-0x17 sockets or if I manage to find any other reason for this behaviour. In any case, thanks again for your very helpful support.

SapientHetero commented 5 years ago

A status of 0x17 means a TCP connection has been established; the courteous thing to do would be to call sockDisconnect to send a FIN packet. I do this all the time with my W5500 shield using this library. Unless your shield is damaged, I can't imagine any scenario other than a software bug that could cause this. Were you able to confirm that you never try to communicate with a register for a socket number higher than 3?

fredlcore commented 5 years ago

I have changed SOCK_CLOSE to SOCK_DISCON now, and limited the disconnect of a socket to one with status 0x14 now (and no longer 0x17 and 0x14 where it so far had only "caught" sockets with status 0x17). But the same problem occurs also with those sockets in 0x14 state, both in 1.1.2 and 2.0.0. Since MAX_NUM_SOCKETS is correctly set to 4 in 1.1.2, the socket register should be correct. In 2.0.0 I have additionally limited the socket number to 4, to make sure this does not go above this limit. I will try a different Ethernet shield and hope that the trouble comes from this, otherwise I have no clue how to move forward from here...

SapientHetero commented 5 years ago

If you're not constrained to use a W5100 shield, you may want to consider WizNet's own W5500 shield. I've found that it's less expensive than competing W5500-based products and its SPI bus supports higher clock rates.

SapientHetero commented 5 years ago

Just had an idea; are you calling socketClose() directly in your code, or is it being called when your code calls EthernetClient::stop()? If the latter, are you calling setConnectionTimeout() anywhere? If so, what value are you setting the timeout to?

fredlcore commented 5 years ago

I don't think that I can call socketClose() directly, becuse it's a private function, isn't it? That's why I took that code and used it directly, as stated somewhere above: SPI.beginTransaction(SPI_ETHERNET_SETTINGS); W5100.execCmdSn(s, Sock_CLOSE); // Sock_DISCON also has no posiive effect SPI.endTransaction();

It seems that EthernetClient::stop() seems to work fine because when monitoring commands sent to the execCmdSn function, there are a lot of Sock_CLOSE coming in, as normally a number of connections are getting closed with every http request etc. But if there is a way to call socketClose() directly and it would make a difference from using the above code instead, I'd be happy to hear about it.

PaulStoffregen commented 5 years ago

Just wanted to chime in for a moment, say I'm watching this thread, even though I won't be able to put engineering time into this library until late-summer time frame.

I know this is asking a lot, but if possible, please consider making test cases for this problem. Ideally, a test case looks like a complete program which can be copied & pasted into the Arduno IDE and run on various boards with different ethernet shields. Whoever runs this needs to also know what to do for the other end of the communication. Whether the remote side should be on the same LAN (low latency) or elsewhere on the Internet (higher packet latency) may also be relevant. The easier it is for anyone to run the test case and reproduce the problem, the better the odds it will get fixed, and future versions of the library won't regress.

SapientHetero commented 5 years ago

EthernetClient::stop() calls Ethernet.socketDisconnect(), then (apparently unnecessarily) calls Ethernet.socketClose() if the socket status hasn't changed to SnSR::CLOSED within the number of milliseconds specified by the library's _timeout variable (settable via SetConnectionTimeout()). I say it's not necessary because the W5500 datasheet notes on page 48 that:

Regardless of ‘TCP server’ or ‘TCP client’, the DISCON command processes the disconnect-process (‘Active close’ or ‘Passive close’).

Active close: it transmits disconnect-request(FIN packet) to the
connected peer

Passive close: When FIN packet is received from peer, a FIN packet is replied back to the peer.
When the disconnect-process is successful (that is, FIN/ACK packet is received successfully), Sn_SR is changed to SOCK_CLOSED. Otherwise, TCPTO occurs (Sn_IR(3)=‘1)= and then Sn_SR is changed
to SOCK_CLOSED.

It seems to me that a DISCON command will ALWAYS result in the socket status being set to closed whether an ACK is received from the other end or not.

I asked whether you call SetConnectionTimeout() because setting it to zero could cause EthernetClient::stop() to hang forever if the socket status never changes to CLOSED for some reason because millis() - start < _timeout will never be true if _timeout = 0. Perhaps SetConnectionTimeout() should not allow a setting of 0, though my reading of the W5500 datasheet suggests this should never happen. I haven't checked the W5100 and W5200 datasheets to see if they behave the same though.

SapientHetero commented 5 years ago

I was just looking at this part of your code:

{
unsigned long thisTime = millis();

for (int i = 0; i < MAX_SOCK_NUM; i++) {
uint8_t s = W5100.readSnSR(i);

if((s == 0x17) || (s == 0x1C)) {
    if(thisTime - connectTime[i] > 30000UL) {
      Serial.print(F("\r\nSocket frozen: "));
      Serial.println(i);
      close(i);
    }
}
else connectTime[i] = thisTime;

socketStat[i] = W5100.readSnSR(i);
}
}

and realized you don't have your register reads bracketed by SPI begin & end, which sets the CS line.

        SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
        stat = W5100.readSnSR(s);
        SPI.endTransaction();

I suspect the W5100 never gets your read requests.

fredlcore commented 5 years ago

No, I do - as I wrote above, I have substituted the close (i) with SPI.beginTransaction(SPI_ETHERNET_SETTINGS); W5100.execCmdSn(s, Sock_CLOSE); // Sock_DISCON also has no posiive effect SPI.endTransaction();

As far as I understood things, execCmdSn should then do readSnSR, shouldn't it?

SapientHetero commented 5 years ago

No. execCmdSn only reads Sn_CR, which the W5x00 sets to zero once it notices the command.

My point was that you need SPI.beginTransaction and SPI.endTransaction to bracket your register reads because SPI.beginTransaction sets the CS line that tells the W5x00 you are talking to it, and SPI.endTransaction clears it so the SPI bus can be used by other devices again.

fredlcore commented 5 years ago

Ah, ok, got it, will try that...

fredlcore commented 5 years ago

Ok, just tried that, same result, still hangs when waiting for while (readSnCR(s)) to "complete"...

fredlcore commented 5 years ago

...and regarding the _timeout variable: I haven't set it anywhere in the code using SetConnectionTimeout(), so I assume that shouldn't be a problem either...

SapientHetero commented 5 years ago

Does your Ethernet shield have a SD card reader on it? If so, are you configuring its CS as an output and setting it high, even if you aren't using it?

fredlcore commented 5 years ago

Yes, I use the SD card and this is what I do during setup: pinMode(10,OUTPUT); digitalWrite(10,HIGH);

SapientHetero commented 5 years ago

Do you still have the delay in the readSnCR loop at the bottom of execCmdSn?

On Thu, Jan 31, 2019, 3:27 PM fredlcore <notifications@github.com wrote:

Yes, I use the SD card and this is what I do during setup: pinMode(10,OUTPUT); digitalWrite(10,HIGH);

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-459493567, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmII-ls78bqGsXVfb_7NvvaIGvzBDOjks5vI1HDgaJpZM4ZLUX4 .

fredlcore commented 5 years ago

Currently not, because I was switching back and forth between library versions 1.1.2 and 2.0.0 and every reinstall overwrote those changes...

PaulStoffregen commented 5 years ago

Yes, I use the SD card and this is what I do during setup: pinMode(10,OUTPUT); digitalWrite(10,HIGH);

What about pin 4?

On most ethernet shields, the SD card's CS pin connects to pin 4, not pin 10. For the sake of testing, the simplest thing to do is simply leave the SD socket empty. But if you must have a card in the socket, make sure the card's CS signal is high before you initialize the ethernet library.

fredlcore commented 5 years ago

Sorry, this is the full code of initialization: pinMode(10,OUTPUT); digitalWrite(10,HIGH); Serial.print(F("Starting SD..")); if(!SD.begin(4)) Serial.println(F("failed")); else Serial.println(F("ok"));

SapientHetero commented 5 years ago

Try adding the delay in the execCmdSn loop.

On Thu, Jan 31, 2019, 3:48 PM fredlcore <notifications@github.com wrote:

Sorry, this is the full code of initialization: pinMode(10,OUTPUT); digitalWrite(10,HIGH); Serial.print(F("Starting SD..")); if(!SD.begin(4)) Serial.println(F("failed")); else Serial.println(F("ok"));

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-459500952, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmII_fzaDfVlJw5XqBO7UCx4x37Y0BLks5vI1aqgaJpZM4ZLUX4 .

fredlcore commented 5 years ago

Ok, this is how the function looks like now, including waits and serial log outputs:

void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete // while (readSnCR(s)) ; do { Serial.println("Waiting..."); delayMicroseconds(100); } while (readSnCR(s)); Serial.println("Done."); }

Everything runs fine with "Waiting..." followed by "Done." hundreds of times. But when the socket list comes up with a socket being for example longer than the set amount of time in lets say state 0x14 like this:

Socket#0:0x14 80 D:192.168.1.20(49668)
Socket#1:0x17 49153 D:192.168.1.5(2323)
Socket#2:0x0 80 D:192.168.1.20(49670)
Socket#3:0x0 0 D:0.0.0.0(0)

Socket frozen: 0
Waiting... Waiting...

...then the loop remains stuck with endlless "Waiting..." after it has been issued a SOCK_DISCON like through these commands:

if((s == 0x14) || (s == 0x1C)) {
    if(thisTime - connectTime[i] > 30000UL) {
      Serial.print(F("\r\nSocket frozen: "));
      Serial.println(i);

      SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
      W5100.execCmdSn(s, Sock_DISCON);
      SPI.endTransaction();

      Serial.println(F("Socket freed."));
      ShowSockStatus();
    }
}
SapientHetero commented 5 years ago

I've read that the W5100 SPI implementation isn't completely reliable. Maybe you should add a timeout and put a loop around it to try again if it hangs, and perhaps end and restart the SPI transaction before retrying. I've never seen my W5500 fail to respond here, but without the delay I've seen connects fail. I theorize that pounding the chip's I/O causes it to miss ACK packets, though I don't yet have evidence of that.

Once you figure out what the problem is you can refine your code to eliminate unnecessary stuff.

On Thu, Jan 31, 2019, 5:11 PM fredlcore <notifications@github.com wrote:

Ok, this is how the function looks like now, including waits and serial log outputs:

void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete // while (readSnCR(s)) ; do { Serial.println("Waiting..."); delayMicroseconds(100); } while (readSnCR(s)); Serial.println("Done."); }

Everything runs fine with "Waiting..." followed by "Done." hundreds of times. But when the socket list comes up with a socket being for example longer than the set amount of time in lets say state 0x14 like this:

Socket#0:0x14 80 D:192.168.1.20(49668) Socket#1:0x17 49153 D:192.168.1.5(2323) Socket#2:0x0 80 D:192.168.1.20(49670) Socket#3:0x0 0 D:0.0.0.0(0)

Socket frozen: 0 Waiting... Waiting...

...then the loop remains stuck with endlless "Waiting..."

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-459526111, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmII0tIgQR3v61i6_Vsx8sA6H1Gcit1ks5vI2n5gaJpZM4ZLUX4 .

SapientHetero commented 5 years ago

Came across this today while looking for something else and thought it may interest you. A guy was having the same problem with a W5500 and it ended up being a bad chip.

https://forum.wiznet.io/t/topic/3734

fredlcore commented 5 years ago

Hm, I have another W5100 Ethernet shield which I now swapped against the one I have been using before for several years, but both shields produce the same result :(...

SapientHetero commented 5 years ago

If you post a functioning program that exhibits the problem and explain how to run the test scenario, I'll build it and try it on my system.

SapientHetero commented 5 years ago

Have you ever checked to see if there's data in the RX or TX buffers when you try to close the sockets and your code hangs? I don't see anything in the datasheet about how this is handled. The W5x00 may just discard it, but it could also refuse to close.

On Fri, Feb 1, 2019, 5:42 PM fredlcore <notifications@github.com wrote:

Hm, I have another W5100 Ethernet shield which I now swapped against the one I have been using before for several years, but both shields produce the same result :(...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-459893042, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmII4phUaA_gWeorO-wAZNj_adzFAXeks5vJMK8gaJpZM4ZLUX4 .

fredlcore commented 5 years ago

Sorry for not having answered yet, I just returned to office after seven months of parental leave and I'm loaded with work, so it'll take a bit of time for me to have a look at this again, but both yours and Paul's help is highly appreciated!

SapientHetero commented 5 years ago

Congratulations!

I've revised my copy of the library substantially, so it may be interesting to see how your program works with my version. I found a few areas of the library that will hang when stressed and believe I've fixed them. Will post my version when I'm finished testing.

PaulStoffregen commented 5 years ago

Any chance you could save & publish that stress testing code? Maybe write some notes as you do the tests, so others can follow the same steps?

Long-term, a convenient way for everyone to stress test the library in the same, ideally consistent way would probably do the most good.

SapientHetero commented 5 years ago

That would be a great idea, but I stress-tested by pounding on my server with multiple browsers from multiple devices to test the new socket management feature I added to my version of the library. Since browsers pool sockets now it was the most expedient way to create an unnatural demand for sockets, though not at all repeatable. A test suite would need to run on a Windows or Linux-based system, or on multiple MCUs in order to produce a similar load.

Xtreemtec commented 5 years ago

@SapientHetero How did the tests go?
I'm experiencing the same issues with the W5100 chip and Ethernet 2.0 library. I got a webserver running with a webpage that poll's for XML data in the Arduino every 2 seconds. The issue is that the Wiznet will over time stop responding ( due to max sockets reached ) while the code uses client.stop(); to close the connection.
While monitoring the number of used sockets in the main loop the processor starts of with 2 sockets opened.

So i gone a little further and rewrote a part statusreport(). and found some interesting information>>

EthernetSocket begin 3 EthernetServer : 80 MaxSockNum : 8 0: port=21 LISTEN Status=20 Avail=0 1: port=55600 LISTEN Status=20 Avail=0 2: port=80 ESTABLISHED Status=23 Avail=0 3: port=80 ESTABLISHED Status=23 Avail=0 4: port=0 ??? Status=10 Avail=0 5: port=0 ??? Status=10 Avail=0 6: port=0 ??? Status=10 Avail=0 7: port=0 ??? Status=10 Avail=0 Although i have a W5100 It lists 8 ports.. So as reported earlier, MAX_SOCK_NUM does NOT get it right. Which might explain why the Ethernet library fails with the W5100! Because everywhere is checked for MAX_SOCK_NUM because of the W5100 but if that value is 8 instead of 4. Something is going sideways.

I have tried to find were MAX_SOCK_NUM is set. But i'm a novice programmer if it comes down to libraries.. So i probably missed it reading over it. Might want to test with hardcoded NUM = 4; to see what the library does then. --> Which line should i look at for this? Because i did a FIND trough the sources and there are a lot of MAX_SOCK_NUM to find. but i'm not entirely sure which to alter. I think it is set in socket.cpp but not sure at this point.?

At this point i have to say i also have a FTP server library on the unit for updating files on the SD card. But i do not run the FTP service routine. As this will only be run in service mode. So i need to kill Socket 0 and Socket 1 to clear up 2 ports. --> How do i kill / clear a specific socket?

Also when the server stops responding. Socket 2 & 3 are both in Established mode. But hang there and will not revert back to CLOSED.
--> How to timeout a connection?

SapientHetero commented 5 years ago

MAX_SOCK_NUM is set in Ethernet.h, and several users have reported that it doesn't work correctly on their platform. Here's what I did in my local copy:

// Configure the maximum number of sockets to support.  W5100 chips can have
// up to 4 sockets.  W5200 & W5500 can have up to 8 sockets.  Several bytes
// of RAM are used for each socket.  Reducing the maximum can save RAM, but
// you are limited to fewer simultaneous connections.
//#if defined(RAMEND) && defined(RAMSTART) && ((RAMEND - RAMSTART) <= 2048)
//#define MAX_SOCK_NUM 4
//#else
#define MAX_SOCK_NUM 8
//#endif

There is code in the library that enforces a limit of 4 for W5100s. Not sure if it covers every case it should, but you can ensure you get 4 by changing the snippet I posted to make MAX_SOCK_NUM always be 4 instead of 8.

What you want to do is "close" a socket. This is done by calling client.stop() on the client that's using the socket you want to recover. The library doesn't expose a function to close a socket by socket number (there is one, but it's "private"). They'll stay in "ESTABLISHED" state as long as the client on the other end doesn't close them.

You must write code to manage connections. Keep track of which client is doing what and when you don't need to talk to one anymore, use client.stop().

The current library lacks some of the tools needed to do that, so I modified a copy to allow me to do what I wanted. It may not work for everyone, especially those running on platforms with very limited RAM. But looking at the code may give you some ideas for how to write your own. My version also fixes some issues that sometimes allowed the library to hang.

I'll post my beta version here once I figure out how to do that and you can try it out. I'd appreciate any feedback you may care to offer. Make sure to call "manageSockets()" each time your server runs. I didn't worry much about making this work for W5100s since I never expect to use one. Wiznet's W5500-based cards can be had for $22 (delivered price at arrow.com) and a bargain at the price.

Xtreemtec commented 5 years ago

A found it. Was too much looking at the code to mis the #define. 8(

Well the point is that the client.close() is executed. But seems the connection hangs here and there.
I managed to free those FTP ports by changing the place for the FTP init to a different place. So i have 4 sockets available. But still it went good for a few minutes running 2 browsers and have to connections open and closed for a while.. But then within 3 requests it filled from 2 to 4 sockets and they all stay in Established state hanging up the connection.

At this point i'm thinking of checking if all 4 ports are in state 23. If so pull some kind of reset. Dirty way! But looking at the code now. client.stop does only disconnect the current sockindex.
But what if 1 is not closed properly because there was no response. and a new one is opened. It gets a different sockindex. but the other connection is not maintained anymore or do i get the control mechanism wrong?

Also the Ethernet.socketRecvAvailable always return 0. So the ethernetserver.available will fail when all 4 sockets are used. Could it be that the ethernetclass in socket.cpp line 352 is looking to the wrong register for a W5100?

I get your point about those W5500 based boards. But i designed a home plug Switrching monitoring device with the W5100 already soldered onto the pcb. So i have no choice to put a W5500 in anymore.