adafruit / Adafruit_CircuitPython_Wiznet5k

Pure-Python interface for WIZNET 5k Ethernet modules
Other
16 stars 36 forks source link

Throwing an exception, then hitting ctrl-D to reload may leak sockets #109

Closed dgnuff closed 1 year ago

dgnuff commented 1 year ago

This came to light when I was diagnosing https://github.com/adafruit/Adafruit_CircuitPython_Wiznet5k/issues/108

Specifically, that issue causes the DHCP client to throw an exception and terminal the application. What I noticed was that it would only do that the first time after powering on the device. If I hit ctrl-D to reboot, the error changed as follows.

During normal flow of the DHCP client, with debug = True enabled I would typically see the following line repeated half a dozen to a dozen times while waiting for the DHCPOFFER from the server:

* socket_available called on socket 1, protocol 2

However, if I were to wait for the exception and then hit ctrl-D, it would just repeat the following line over and over for ever:

* socket_available called on socket 2, protocol 2

Hitting ctrl-C to interrupt, and another ctrl-D to reload again produces:

* socket_available called on socket 3, protocol 2

I think you can see where this is going now. It eventually reaches:

* socket_available called on socket 7, protocol 2

after which the error changes to:

* DHCP: Failed to allocate socket
Traceback (most recent call last):
  File "code.py", line 41, in <module>
  File "libs/socket.py", line 75, in init
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k.py", line 230, in __init__
RuntimeError: Failed to configure DHCP Server!

I don't know the inner details of this, but this has a really strong odor of a socket leak to me.

If there's some way to do so, you might want to consider a hard reset of the wiznet5k when the system comes up, to avoid this problem that might be an issue for other situations.

anecdata commented 1 year ago

Hard reset on startup is a good idea, in the library or in code. But the reset pin connection varies by WIZnet board, and may not have been connected by the user (or set up in code).

dgnuff commented 1 year ago

Aha!! There's no hardware problem that a good software engineer can't fix. ;)

During WIZNET5K::__init__() a few strategic calls to self._write_sncr(socket_num, _CMD_SOCK_CLOSE) seem to do the trick. At least with the error in issue https://github.com/adafruit/Adafruit_CircuitPython_Wiznet5k/issues/108 still present, it doesn't go through the socket leak flow with repeated hits on ctrl-D, and instead stays with socket 1 multiple times over.

PR will be incoming, it'll almost certainly need a rewrite, since I put a try: except: pass around the self._write_sncr() calls to deal with any issues of attempting to close a socket that isn't open. That might not be necessary, but it does no harm, and a more experienced engineer will be far better qualified to make that call.

BiffoBear commented 1 year ago

@dgnuff Were you using a W5500 or a W5100s? The software reset on the W5500 is currently unreliable, and I have an open PR that should fix it. So if you are using a W5500 I would be grateful for a quick test because I won't have access to one until late May 😳

dgnuff commented 1 year ago

I was using a Wiznet 5500, on an Ethernet Featherwing. I'm unable to test on a Wiznet 5100 since I don't have a device that uses that chip.

On Thu, Apr 20, 2023, 03:12 Martin Stephens @.***> wrote:

@dgnuff https://github.com/dgnuff Were you using a W5500 or a W5100s?

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_CircuitPython_Wiznet5k/issues/109#issuecomment-1516073296, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGK73XEK56WESEVD4UOLLHLXCEDYZANCNFSM6AAAAAAWYFDRJQ . You are receiving this because you were mentioned.Message ID: @.*** com>

BiffoBear commented 1 year ago

I was using a Wiznet 5500, on an Ethernet Featherwing. I'm unable to test on a Wiznet 5100 since I don't have a device that uses that chip.

The W5500 I have with me isn't functioning, so I cannot test against a W5500 until I get home in late May. The fact that both you and @FoamyGuy experience problems after a soft reset on a W5500 makes me think that the issue lies in the chip initialisation for the W5500. I made a best guess edit on the PR that I have pending, but it made things worse.

If you are using the Adafruit Ethernet Feather, the W5500 hardware reset pin is broken out, and is supported in the library:

:param digitalio.DigitalInOut reset: Optional reset pin, defaults to None.

This may solve your problem, if it does, please let me know.

dgnuff commented 1 year ago

I can certainly adjust my code.py to specify the reset pin and see if that makes the problem go away. Even so, I have a software fix available that I'd like to consider submitting, since it will help people who are using W5500 chips that don't have the reset wired up.

BiffoBear commented 1 year ago

I can certainly adjust my code.py to specify the reset pin and see if that makes the problem go away. Even so, I have a software fix available that I'd like to consider submitting, since it will help people who are using W5500 chips that don't have the reset wired up.

Thanks! I purloined a soldering iron and fixed my w5500, so I'm able to test on it. The problem is solved by soft resetting the w5500 during initialisation. The fix works on my pending DHCP PR and I've opened a PR to fix it in the current main branch.