adafruit / Adafruit_CircuitPython_Wiznet5k

Pure-Python interface for WIZNET 5k Ethernet modules
Other
16 stars 36 forks source link

Intermittent hangs in NTP.get_time() #56

Closed chabala closed 1 year ago

chabala commented 2 years ago

I'm working with the WIZnet W5100S-EVB-Pico. As part of troubleshooting another issue, I wanted timestamps, so I configured NTP. I saw there was no NTP example code, so here is my short example code:

import board
import busio
import digitalio
import time
import os
import rtc

from adafruit_wiznet5k.adafruit_wiznet5k import WIZNET5K
from adafruit_wiznet5k.adafruit_wiznet5k_ntp import NTP

# SPI0
SPI0_SCK = board.GP18
SPI0_TX = board.GP19
SPI0_RX = board.GP16
SPI0_CSn = board.GP17

# reset
W5x00_RSTn = board.GP20

print('starting...')

# define your own unique MAC address for the network interface
MAC_ADDR = (0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED)

eth_rst = digitalio.DigitalInOut(W5x00_RSTn)
eth_rst.direction = digitalio.Direction.OUTPUT

spi_bus = busio.SPI(SPI0_SCK, MOSI=SPI0_TX, MISO=SPI0_RX)
eth_cs = digitalio.DigitalInOut(SPI0_CSn)
eth = WIZNET5K(spi_bus, eth_cs, reset=eth_rst, is_dhcp=True, mac=MAC_ADDR, hostname="W5100S", dhcp_timeout=30, debug=False)

def syncTimeWithNTP():
    try:
        print("\nconstructing NTP object")
        ntp = NTP(eth, "pool.ntp.org", -5)
        print("setting RTC via NTP request")
        rtc.RTC().datetime = ntp.get_time()
        print("RTC set successfully")
    except OSError as e:
        print("OSError", e)

def currentTime():
    t = time.localtime(time.time())
    return "%04d-%02d-%02dT%02d:%02d:%02d" % (t.tm_year, t.tm_mon, t.tm_mday, t.tm_hour, t.tm_min, t.tm_sec)

print("machine:", os.uname().machine)
print("sysname:", os.uname().sysname)
print("CircuitPython version:", os.uname().version)
print("Chip Version:", eth.chip)
print("MAC Address:", ":".join("%02X" % _ for _ in eth.mac_address))
print("IP Address:", eth.pretty_ip(eth.ip_address))

syncTimeWithNTP()
print(currentTime())

while True:
    time.sleep(5)
    # Maintain DHCP lease
    eth.maintain_dhcp_lease()

This works, often, like so:

code.py output:
starting...
machine: Raspberry Pi Pico with rp2040
sysname: rp2040
CircuitPython version: 7.2.3 on 2022-03-16
Chip Version: w5100s
MAC Address: DE:AD:BE:EF:FE:ED
IP Address: 192.168.1.195

constructing NTP object
setting RTC via NTP request
RTC set successfully
2022-04-03T15:30:12

However, just as often, it hangs in NTP.get_time():

code.py output:
starting...
machine: Raspberry Pi Pico with rp2040
sysname: rp2040
CircuitPython version: 7.2.3 on 2022-03-16
Chip Version: w5100s
MAC Address: DE:AD:BE:EF:FE:ED
IP Address: 192.168.1.195

constructing NTP object
setting RTC via NTP request
Traceback (most recent call last):
  File "code.py", line 55, in <module>
  File "code.py", line 38, in syncTimeWithNTP
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_ntp.py", line 58, in get_time
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_socket.py", line 294, in recv
KeyboardInterrupt:

line 294 is gc.collect()

I've also seen this:

code.py output:
starting...
machine: Raspberry Pi Pico with rp2040
sysname: rp2040
CircuitPython version: 7.2.3 on 2022-03-16
Chip Version: w5100s
MAC Address: DE:AD:BE:EF:FE:ED
IP Address: 192.168.1.195

constructing NTP object
setting RTC via NTP request
Traceback (most recent call last):
  File "code.py", line 55, in <module>
  File "code.py", line 38, in syncTimeWithNTP
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_ntp.py", line 58, in get_time
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_socket.py", line 281, in recv
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_socket.py", line 406, in available
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k.py", line 518, in socket_available
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k.py", line 866, in _get_rx_rcv_size
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k.py", line 907, in _read_snrx_rsr
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k.py", line 966, in _read_socket
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k.py", line 461, in read
KeyboardInterrupt:

This is a bus_device write. Hard to tell if it was stuck there or just stuck in a busy loop in that vicinity.

I'm using the latest stable CP (as shown in the output above), and latest Wiznet5k library.

chabala commented 2 years ago

I see there was a CP 7.2.4 released last week. I have updated, but it is still flaky:

code.py output:
starting...
machine: Raspberry Pi Pico with rp2040
sysname: rp2040
CircuitPython version: 7.2.4 on 2022-03-31
Chip Version: w5100s
MAC Address: DE:AD:BE:EF:FE:ED
IP Address: 192.168.1.195

constructing NTP object
setting RTC via NTP request
Traceback (most recent call last):
  File "code.py", line 55, in <module>
  File "code.py", line 38, in syncTimeWithNTP
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_ntp.py", line 58, in get_time
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_socket.py", line 291, in recv
KeyboardInterrupt: 

Updated again:

machine: Raspberry Pi Pico with rp2040
sysname: rp2040
CircuitPython version: 7.2.5 on 2022-04-06
Chip Version: w5100s
MAC Address: DE:AD:BE:EF:FE:ED
IP Address: 192.168.1.195

constructing NTP object
setting RTC via NTP request
Traceback (most recent call last):
  File "code.py", line 55, in <module>
  File "code.py", line 38, in syncTimeWithNTP
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_ntp.py", line 58, in get_time
  File "/lib/adafruit_wiznet5k/adafruit_wiznet5k_socket.py", line 291, in recv
KeyboardInterrupt: 
chabala commented 2 years ago

My latest theory on this is that since NTP uses UDP, on occasion the reply may be lost in transit, and the socket is stuck in a loop waiting to read a reply that isn't coming.

Coincidentally, I've found a PR that sought to improve Wisnet5k UDP handling, but unfortunately it was opened against a Wiznet demo repository instead of this one. I've applied it locally, and while it doesn't solve the hanging issue, it doesn't seem to cause any ill effects either. I don't fully understand all the changes, but it looks like it's solving a mismatch between the number of sockets available and how the existing library handles tracking UDP globally.

chabala commented 2 years ago

I've setup a ntpd daemon on my local network, and after running for a few days, this has almost completely eliminated any hanging behavior, vs using pool.ntp.org. Not entirely though. But that would suggest that something about how the code handles waiting for the response could be more robust, likely with a bit of timeout and retry logic.