adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.06k stars 1.2k forks source link

no socket communication with W5500 ethernet wing and feather m4 express #1800

Closed gvcp closed 5 years ago

gvcp commented 5 years ago

With a TCP server listening, I get a valid connection from the ethernet wing, but when I try to send or receive data I always get an input/output error when entering the commands via REPL. When using the same sequence in code.py the feather M4 just hangs and doesn´t react to ctrl-C. I connected both modules by piggybacking them via headers without additional wires. it´s not a problem of USB supply, behaves the same with external power supply. from REPL other commands like .connected, .ifconfig() are working, .close seems to be ignored, the server side still shows the connection

The code:

import board import busio import wiznet import socket spi = busio.SPI(clock=board.SCK, MOSI=board.MOSI, MISO=board.MISO) eth = wiznet.WIZNET5K(spi, board.D10, board.D11) host = '192.168.1.243' fam, typ, pro, nam, socketaddr = socket.getaddrinfo(host, 5000)[0] ss = socket.socket(fam, typ, pro) ss.connect(socketaddr) ss.send('Hello') ss.recv(10)

nickzoic commented 5 years ago

OK, I'm having a look at this since you're not the only one having problems! (See also: https://github.com/adafruit/circuitpython/issues/703#issuecomment-462819747 , #1500 ) I suspect something like a timing error or race condition.

Development branch: https://github.com/nickzoic/micropython/tree/circuitpython-nickzoic-1800-wiznet-socket

nickzoic commented 5 years ago

I've added in a patch to the development branch above to fix a length bug where reads have a bunch of zeros on the end.

Also I noticed while messing with this code that unless you explicitly close them, sockets don't close until they are GCed. There's only 8 available so they run out pretty quick. This results in a not very helpful OSError: 24 message which really should get a translation, etc.

I also sometimes get an OSError: [Errno 5] Input/output error on send. Sometimes it helps to add a small time.sleep() between connect and send, but the amount seems to vary depending on the site you're attempting to connect to, perhaps.

Still looking into it ...

nickzoic commented 5 years ago

You're not going to believe how dumb this is :-/ DHCP kept pinching socket 0 away and changing it to a UDP socket, which was causing the SOCKERR_SOCKMODE error (5) to get thrown.

Basic proof-of-concept patch in the branch above, slightly more finessed version on its way.

gvcp commented 5 years ago

I didn´t assume that DHCP with it´s UDP ports is used by default as I had a static IP addresses on the WIZNET. I didn´t call wiznet.ifconfig with parameters and now I am unclear about how to switch between static address and DHCP: circuitpython.readthedocs mentions a wiznet.dhcp call. Do I just have to set this to false or true immedately after creating WIZNET5K or true to achieve this ? Without understanding it all: In the code you changed it seems that after spi,cs,rst there is an optional parameter MP_QSTR_dhcp for this purpose not mentioned in in the docs ?

nickzoic commented 5 years ago

On Thu, 2 May 2019, at 18:36, gvcp wrote:

I didn´t assume that DHCP with it´s UDP ports is used by default as I had a static IP addresses on the WIZNET. I didn´t call wiznet.ifconfig with parameters and now I am unclear about how to switch between static address and DHCP: circuitpython.readthedocs mentions a wiznet.dhcp call. Do I just have to set this to false or true immedately after creating WIZNET5K or true to achieve this ?

Yeah, unless you explicity disable DHCP (by setting wiznet.dhcp=False) it'll still interfere with socket #0. How much it interferes depends on your local network, etc, which is why we were seeing varying issues.

Without understanding it all: In the code you changed it seems that after spi,cs,rst there is an optional parameter MP_QSTR_dhcp for this purpose not mentioned in in the docs ?

Very good point: I only just implemented that and still need to add it to the docs in the PR :-) It just sets an initial value for the 'dhcp' property, so if it's on (by default) DHCP will start up immediately. I was just thinking about it and perhaps setting an address with wiz.ifconfig(address_tuple) should automatically disable DHCP as well.

-----Nick

nickzoic commented 5 years ago

OK I've made some builds for revision d97c81b and temporarily made them available at:

[temporary location removed now this is merged back]

If all goes well, I expect this PR will get merged back into CircuitPython proper shortly after PyCon, but these should at least let you all test the new code now and see if it solves your problem!

@gvcp / @notro / @genevanmeter / @turbinenreiter / @brentru / @siddacious / @ladyada

gvcp commented 5 years ago

Partial success with feather M4 Express, but there seem to be 2 different issues, 1 with wiznet, 1 with circuitpython 4.0 RC1 in general:

  1. I was able to send and receive data (connected to a server) via REPL executing the code line by line without problems, but when I do the same in code.py everything hangs, nothing is sent and I also don´t receive anything on the feather. Here is the test code:

import wiznet import socket spi = busio.SPI(clock=board.SCK, MOSI=board.MOSI, MISO=board.MISO) eth = wiznet.WIZNET5K(spi, board.D10, board.D11) eth.connected host = '192.168.1.243' fam, typ, pro, nam, socketaddr = socket.getaddrinfo(host, 5000)[0] ss = socket.socket(fam, typ, pro) ss.connect(socketaddr) eth.ifconfig() ss.send(„Hello“) test=ss.recv(10) print(test)

I will further investigate this to see how far it comes without hanging.

  1. It was hard to get out of this: After reset the the circuitpy drive did not appear, then I tried the usual double reset to get to featherboot and tried a) to use your uf2 image, b) 4.00 RC1, both had no effect and circuitpy didn´t reappear (apparently it didn´t erase code.py.) It only worked after using a Feather M4 Express UF2 from a circuitpython 3 release.
gvcp commented 5 years ago

I added a print statement after each line in the above code, the last output I got was that the socket was created (ss=socket.socket...) After about a minute I got a traceback for ss.connect with an OSError 4 (interrupted system call) then everything hanged again in the same way than above. The server side didn´t show a connection from the feather M4 client.

Another try: adding a 3 second sleep before each print statement changed the behavior a bit: I got a connection on the server side as a last action, after a reset the circuitpy drive appeared, but with code.py but couldn´t be written and mu-editor hanged too.

nickzoic commented 5 years ago

On Sun, 5 May 2019, at 07:36, gvcp wrote:

I added a print statement after each line in the above code, the last output I got was that the socket was created (ss=socket.socket...) After about a minute I got a traceback for ss.connect with an OSError 4 (interrupted system call) then everything hanged again in the same way than above. The server side didn´t show a connection from the feather M4 client.

Another try: adding a 3 second sleep before each print statement changed the behavior a bit: I got a connection on the server side as a last action, after a reset the circuitpy drive appeared, but with code.py but couldn´t be written and mu-editor hanged too.

Hmmm, very weird. I didn't try loading via code.py, I'll give that a go. What it might mean is that we're sending stuff to the wiznet module before it is fully ready. Thanks for the feedback.

gvcp commented 5 years ago

Just to be sure: I assume like the arduino version the wiznet circuitpython implementation doesn´t need the hardware interrupt available on the seperate IRQ pin of the ethernet feather ? Or does it have to be connected to some pin of the processor ?

nickzoic commented 5 years ago

On Sun, 5 May 2019, at 18:46, gvcp wrote:

Just to be sure: I assume like the arduino version the wiznet circuitpython implementation doesn´t need the hardware interrupt available on the seperate IRQ pin of the ethernet feather ? Or does it have to be connected to some pin of the processor ?

This implementation only uses SPI plus CS (chip select) and RST (reset). RST isn't really required either, I think, so I should make that optional.

I'll try and reproduce the code.py problem on Tuesday ...

nickzoic commented 5 years ago

OK, so I can run the ethernet example code fine from code.py, so long as it successfully connects, and it loads example.com a hundred times without errors then gets to "Code done running".

If there's no ethernet connection then the socket.getaddrinfo() fails (slowly) and then it drops out to the "Code done running." when it hits the OSError: -2.

Either way though, a few seconds after it gets done, it locks up hard and the serial port disconnects. If I hard reset it it works again. This suggests to me that I'm not cleaning up / de-initing some resource properly at the end of the code execution and then it's getting forgotten about when the shell restarts.

UPDATE: This problem doesn't seem to occur if DHCP is off and the interface is explicitly configured instead, so, uh, yeah. Looks likely that the culprit is somewhere around there with the network deinit process.

nickzoic commented 5 years ago

OK, so 24934a1 removes nics from the network stack when the network deinits: this seems to fix the problem which I was seeing when loading code from code.py and then dropping out to "Code done running." ... it now drops out to a happy REPL and you can even Ctrl-D to restart or 'import wiznet' to use it from the REPL and it works.

In the process, I noticed that I'd not actually attached the jumper wire for RST anyway and it all worked perfectly with or without, so I've made the RST pin an optional parameter as well (264fc2b) and updated the docs a little (832f07a).

Lastly, I've uploaded new builds to [temporary location removed now this is merged back]

ladyada commented 5 years ago

just popping in to mention that yes, the reset pin is not essential - altho it is handy :)

gvcp commented 5 years ago

should it just be connected to the "Rst" pin of the feather M4 express so that both are reset at the same time or must it be connected to a special IO pin ?

ladyada commented 5 years ago

either

nickzoic commented 5 years ago

The original example uses D10 as CS (chip select) and D11 as RST (reset) and specifies those when constructing the interface, so you can use a different pin for reset if you like: eth = wiznet.WIZNET5K(spi, board.D10, board.D11) I've made the rst parameter optional so if you've not hooked up reset, or hooked it up to the reset button, you can construct the interface like: eth = wiznet.WIZNET5K(spi, board.D10) Either way you can avoid configuring DHCP at initialization by providing a dhcp=False kwarg: eth = wiznet.WIZNET5K(spi, board.D10, dhcp=False)

Further changes I'm considering:

gvcp commented 5 years ago

Thanks for the information.

And... great work in a short period of time!

I am still testing: in general it works now, sending as well as receiving in code.py and it doesn´t hang anymore, but there are still some confusing points:

Without looking through the sources: Can I use all the functions, methods, constants with wiznet described under the link below or are the exceptions which currently will not work ?

https://circuitpython.readthedocs.io/en/2.x/docs/library/usocket.html

gvcp commented 5 years ago

Found another issue after testing special situations: If I first have the server not listening I get the error 4 described above and it comes back to the REPL Then after I activate the server socket and execute the code again by pressing ctrl D it hangs (no REPL in MU, not possible to access drive.) After a hardware reset of the feather I get circuitpy at least on my computer and can delete code.py, after relaunching MU it is able to communicate again.

nickzoic commented 5 years ago

Hmmm, so just to confirm, the situation here is:

I'm thinking that might be related to the "Further changes I'm considering" afterthought above. I'll see if I can reproduce the problem (probably Friday GMT+10, maybe earlier if I get the chance)

nickzoic commented 5 years ago

@gvcp responses to your previous questions:

the startup issue

this works for me: code.py

It's getting an address by DHCP, then connecting to example.com. It spins for a few seconds waiting for the DHCP address. Now, interestingly, socket.getaddrinfo() and socket.socket() work even without a network, if you pass it a numeric address anyway. So it's not so much a question of whether the socket has been successfully created as whether the socket has anything to connect to.

>>> eth.connected
False
>>> socket.getaddrinfo('8.8.8.8',80)
[(2, 1, 0, '', ('1.2.3.4', 80))]
>>> ss = socket.socket(2,1,0)
>>>

If I try to do ss.connect before DHCP has completed, I get OSError: [Errno 13] Permission denied which would fit with your symptoms: DHCP takes a few seconds to complete in the background. The error message is not exactly a picture of clarity though, I think I should change that to an explicit exception.

Also, I'm not really happy with ifconfig()[0] == '0.0.0.0' as an API, I wonder if we should add a readable property like wiznet.ready which returns True if the interface is connected and has a valid address? Though maybe it is more pythonic to just throw an explicit exception if it isn't ready so the user code can retry.

disconnecting sockets

yeah, another error which is not a shining example. The socket module has a bunch of routines which call mp_raise_OSError(_errno), and these should really have a translation table or something to turn those into friendly error messages. (UPDATE: #1880)

support for socket API

The right place to look is https://circuitpython.readthedocs.io/en/latest/shared-bindings/socket/__init__.html# ... the URL you posted is for the inherited MicroPython 'usocket' library and we should remove it to avoid confusion, thanks for pointing it out.

All the circuitpython socket calls should work, note though that makefile isn't implemented which is a pain, see #1522

nickzoic commented 5 years ago

OK that branch is merged back in, @gvcp thanks for your help testing this, if you have further issues please raise a new issue for them and mention me, I'll try and look at it ASAP ...