Closed ssube closed 4 years ago
im impressed it works cause we haven't done any ethernet projects yet :D @nickzoic added it so will know the most. we're mostly supporting networking with esp32spi https://github.com/ladyada/Adafruit_CircuitPython_ESP32SPI
I don't have one of the esp32 wifi wings on hand right now, but will look into picking one up, at least for testing. Are they strictly wifi, or is it possible to solder on an rj45 port? Not sure all the final locations for these sensors will have strong wifi signal, and I was hoping to use PoE splitters to power the boards from a single switch.
Not sure if this is related, but on some further testing, excessive load seems to also confuse the socket after a while. Something like this:
> ./wrk -t 1 -c 1 -d 5s http://10.2.2.255:8080/metrics
Running 5s test @ http://10.2.2.255:8080/metrics
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 82.34ms 21.98ms 108.68ms 67.24%
Req/Sec 12.04 4.43 20.00 75.56%
58 requests in 5.01s, 45.04KB read
Socket errors: connect 0, read 1, write 21, timeout 0
Requests/sec: 11.59
Transfer/sec: 9.00KB
Will cause an OSError
with errno
3, 4, or 7. Usually, creating a new socket and listening again fixes it, but may leave wrk
hanging (while [[ true ]]; do curl
can also trigger the errors, but is not left hanging).
Headers: {'http': 'HTTP/1.1', 'method': 'GET', 'path': '/metrics'}
Connection: ('10.2.1.193', 8080)
Headers: {'http': 'HTTP/1.1', 'method': 'GET', 'path': '/metrics'}
Error accepting request: [Errno 128] ENOTCONN
Binding: 10.2.2.255:8080
Unable to set socket timeout: [Errno 22] Invalid argument
Error accepting request: [Errno 128] ENOTCONN
Binding: 10.2.2.255:8080
Traceback (most recent call last):
File "code.py", line 102, in <module>
File "code.py", line 97, in main
File "code.py", line 97, in main
File "/lib/prometheus_express/utils.py", line 24, in bind_server
File "/lib/prometheus_express/server.py", line 12, in start_http_server
OSError: 4
With the -c 1 -t 1
flags, wrk
should be running a single connection, but I'm not sure how that client connection would prompt an OSError
(or a few different ones).
the esp's are def just wifi
See also #1880 ... exceptions could be a lot better on 'socket', 'network' and 'wiznet' libraries, we just have to decide what to standardize on ...
@brentru I don't think they are the same problem. The issue I am reporting is that the Wiznet 5500 has no on-board long term memory for a MAC address, so if you don't give it one after boot, it makes up a random number for the MAC, and causes an IP address change every time you reboot. The Arduino API has a function to assign/force a MAC address. The CP API does not. As far as I know, it is stable as long as power is still applied, so should not cause this issue. I didn't go beyond discovering it was hopping IP address every power cycle, and I put it back in the desk drawer. --- graham (discord), aka phrogger(github)
PS: The ESP32 has support for RMII so you can attach an Ethernet interface like the LAN8270 that way, although it is a rather different beast than the way WizNet works.
See: https://github.com/micropython/micropython-esp32/issues/172 and https://github.com/micropython/micropython/issues/4502 and the "network.LAN" class.
Revisited this over the break using v4.1.2, and with a few tweaks to my code (and a fixed IP address), things seem more stable with the Feather M4/Ethernet Wing setup:
[09:38:29] ssube@cerberus ~/code/ssube/prometheus-express (master@93aa0347) > time ./scripts/bench-curl.sh
...
Request: 9800 of 10000
Request: 9900 of 10000
Request: 10000 of 10000
real 6m47.389s
user 1m27.865s
sys 0m51.520s
With the fixed IP, rebooting after an OSError takes a few seconds for the device to come back at the same address, and Prometheus' retry logic can handle that. My use-case is monitoring on a 60s interval, so that simulates a week's worth of requests, which is good enough uptime to start deploying them.
Still seeing issues accepting new connections under load. Specifically, running wrk
(this script) puts the chip into an immediate error loop:
[09:55:14] ssube@cerberus ~/code/ssube/prometheus-express (master@93aa0347) > time ./scripts/bench-wrk.sh
Running 1m test @ http://10.2.1.220:8080/metrics
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.00us 0.00us 0.00us -nan%
Req/Sec 0.00 0.00 0.00 -nan%
0 requests in 5.80s, 0.00B read
Socket errors: connect 0, read 0, write 64203, timeout 0
Requests/sec: 0.00
Transfer/sec: 0.00B
real 0m5.836s
user 0m0.344s
sys 0m0.971s
produces:
Error accepting request: [Errno 128] ENOTCONN
Error accepting request: [Errno 128] ENOTCONN
Error accepting request: [Errno 128] ENOTCONN
Error accepting request: [Errno 128] ENOTCONN
Error accepting request: [Errno 128] ENOTCONN
Error accepting request: [Errno 128] ENOTCONN
Error accepting request: [Errno 128] ENOTCONN
It doesn't appear to recover from this without a ^C ^D
reboot or hard reset (maybe it would after logging all 64k?). I can only reproduce this by running wrk
, curl
in a tight loop doesn't trigger the same behavior. Changing the socket accept depth doesn't appear to make a difference, but I'm not sure what values would make sense here.
I'm going to test this on an Airlift Wing next and see how it works with the ESP32, then pick up a few of those Olimex chips. Built in POE would save me an adapter :D
Interesting. I'll check it out.
Got a chance to try my code on the ESP32-POE and -EVB today (running https://github.com/ssube/prometheus_express/blob/master/examples/olimex-esp32-random.py), with fantastic results on both wifi and wired. Socket timeouts appear to work there, allowing me to collect sensor information in between requests. Not seeing the ENOTCONN errors from wrk either, those only appear with the Wiznet hardware so far. Chrome, curl, and most important Prometheus are scraping the ESPs just fine.
The two issues raised here have split into a couple of different places, so I'm going to close this one for clarity and work can continue over there:
The docs and source seem to support socket.settimeout but it appears to invoke a method (syscall?) on the NIC, which throws:
See #2097 for work on adding settimeout to WizNet 5500. Once I actually get something working I'll backport it to MicroPython too.
The issue I am reporting is that the Wiznet 5500 has no on-board long term memory for a MAC address, so if you don't give it one after boot, it makes up a random number for the MAC
See #2274 for an approach to providing a stable MAC address
Hi, I'm looking to use the Feather M4 Express and Ethernet FeatherWing as a way to collect I2C sensor data with Prometheus and threw together a very small HTTP server for that purpose. It works pretty well with light traffic, but if the
socket.accept
waits too long for an incoming request, the board seems to fall off the network and my code soft-reboots without raising an error:When no requests are incoming, I would like the
socket.accept
to break after a few seconds to let the network driver run and take another sample from the sensors.The docs and source seem to support
socket.settimeout
but it appears to invoke a method (syscall?) on the NIC, which throws:Is there a way to make the socket timeout, have a timer interrupt it, or work around this somehow?
Running:
My socket setup code is:
With a more complete example here: https://github.com/ssube/prometheus_express/blob/master/examples/feather-m4-i2c.py