adafruit / Adafruit_CircuitPython_Requests

Requests-like interface for web interfacing
MIT License
51 stars 36 forks source link

gaierror, Out of sockets, Sending Request Failure: due to syntax #145

Closed DJDevon3 closed 7 months ago

DJDevon3 commented 7 months ago

Circuit Python 8.2.7, Adafruit Feather ESP32-S3 I've tracked down a weird syntax issue that caused me a lot of headache.

response = requests.get(DATA_SOURCE).json()

response = requests.get(DATA_SOURCE).close() # valid syntax, doesn't do anything, but will leak sockets.
response.close() # flat out does not work with .json above

I believe I got that syntax from Neradoc or Deshipu a long time ago and I've been using it ever since in my personal projects. It only reduces the code by 1 line but it worked... until 8.2.7 where it exposed a potential bug when my script starting failing in unexpected ways completely deviating from its past stable behavior.

As RimWolf & Anecdata have pointed out the correct way is

response = requests.get(DATA_SOURCE)
owm_response = response.json()
do_something_with(owm_response)
response.close()

Above method "should" automatically close the response and seems to work ok. or

with requests.get(DATA_SOURCE) as response:
    owm_response = response.json()
    do_something_with(owm_response)

Above method with requests will close a socket automatically so response.close() is unneeded but you can still use it explicitly if you want.

The problem that cropped up is if you use response = requests.get(DATA_SOURCE).json() it will slowly leak sockets failing to close them. It will work fine for hours and even days at a time but will eventually fail to close a socket properly which will end up with a -2 gaierror and/or

Traceback (most recent call last):
  File "adafruit_requests.py", line 500, in _get_socket
RuntimeError: Out of sockets

The culprit is response = requests.get(DATA_SOURCE).json() is valid syntax and does not cause an error but will 100% eventually fail. I've gone as long as 4 days without encountering that error but it will happen eventually. This syntax seems to work on the surface at first but also makes using response.close() impossible due to a dict error.

This issue is related to OSError -2 gaierror and RuntimeError: Out of sockets.

I modified my requests script to use the default way as Rimwolf & Anecdata advised and have not had a gaierror, sending request error, or socket error since. The biggest problem with tracking this one down is how randomly intermittent it is. It could take days or weeks depending on your polling rate. My polling rate is typically 15 minutes.

I think this is a syntax issue as requests wants things in a specific way. If you deviate from the norm it will not complain about the syntax and instead will start throwing random errors. There are a lot of response.close() open issues and I'm not sure where this one fits. Just wanted to share my experience fighting against it for weeks trying to track down the gaierror which eventually led to this.

DJDevon3 commented 7 months ago

I've changed syntax to with requests.get(DATA_SOURCE) as owm_request and still getting some kind of socket leak. Here's my serial output. After 3 days it errored with an AdafruitIO error. Currently if that error is detected it does a supervisor.reload which frees everything up again and will work for a couple hours or days but will eventually error again.

===============================
Board Uptime:  3.2 days
Temp:  81.8681
Actual Temp: 81.1
| Connecting to WiFi...
| ✅ WiFi!
| | Attempting to GET Weather!
| | Account within Request Limit
| | ✅ Connected to OpenWeatherMap
| | | Sunrise: 06:50
| | | Sunset: 17:27
| | | Temp: 84.61
| | | Pressure: 1011
| | | Humidity: 68
| | | Weather Type: Clear
| | | Wind Speed: 10.36
| | | Timestamp: 2023/11/26 15:05
| | ✂️ Disconnected from OpenWeatherMap
| | ❌ AdafruitIO_MQTTError: 
 MQTT Error: Unable to connect to Adafruit IO.
Code done running.
soft reboot

Full code for Feather Weather here

anecdata commented 7 months ago

Can you post the full stack trace.. it seems to be coming from adafruit_minimqtt or adafruit_io?

DJDevon3 commented 7 months ago

Because I'm using exception handlers now there is no traceback, only the error message. I tried finding where that error message is coming from but I can't even find that line in mmqtt or adafruit_io library. MQTT Error: Unable to connect to Adafruit IO. I have no idea where that is coming from.

I have it failing reliably, as in it will happen once or twice per 24 hours then automatically reload. I can't speed up the process or I'll get temp banned by OpenWeatherMap. I have to let it ride waiting every 15 minutes. If I remove the part for OpenWeatherMap, that would change the amount of sockets it would normally have.

It's in a state that is usable only because of supervisor.reload(). If I remove that, the script will fail to connect infinitely when that error eventually occurs. It only has to show up once and from then on any attempts to connect to adafruitio fail. I'm not getting any errors in adafruitio dashboard live errors. From AdafruitIO's perspective there is no connection taking place to create an error on their side. OpenWeatherMaps will still connect though... It's just odd. It feels like I'm running out of sockets still but only for AdafruitIO and the error doesn't say that.

If you have any tips on how to better track something like this down I'm all ears. Remove the exception handlers to force a traceback?

anecdata commented 7 months ago

You can print the traceback within the exception handler like:

import traceback
# ...
try:
    # ...
except Exception as ex:  # or tuple of specific exception types)
    traceback.print_exception(ex, ex, ex.__traceback__)
DJDevon3 commented 7 months ago

Thank you for the help here and in discord. I'm on it. So far it's been a little over an hour and successfully run 5 times.

Board Uptime:  3.6 days
Temp:  80.0283
Actual Temp: 79.0
| Connecting to WiFi...
| ✅ WiFi!
| | Attempting to GET Weather!
| | Account within Request Limit
| | ✅ Connected to OpenWeatherMap
| | | Sunrise: 06:50
| | | Sunset: 17:27
| | | Temp: 74.12
| | | Pressure: 1013
| | | Humidity: 93
| | | Weather Type: Clouds
| | | Wind Speed: 3.44
| | | Timestamp: 2023/11/26 23:05
| | ✂️ Disconnected from OpenWeatherMap
| | ✅ Connected to MQTT Broker!
| | | ✅ Publishing BME280-Unbiased: 80.03 | BME280-RealTemp: 79.03 | BME280-Pressure: 1014.3 | BME280-Humidity: 80.3 | BME280-Altitude: 0.04
| | ✂️ Disconnected from MQTT Broker
| ✂️ Disconnected from Wifi
Next Update:  15 minutes

This is normal intended behavior. Because it makes multiple requests with multiple API's if it was a leak I would expect that to show up prior to 5 consecutive runs or be within the same ballpark in terms of runs per failure. It usually fails past 5 runs, anywhere between about 5 to 50 runs. The random failure is what has been so hard to track.

I'm sure it will fail by the morning so will let it run overnight.

DJDevon3 commented 7 months ago

This was not as revealing as I'd hoped.

Traceback (most recent call last):
  File "code.py", line 556, in <module>
  File "adafruit_io/adafruit_io.py", line 116, in connect
AdafruitIO_MQTTError: MQTT Error: Unable to connect to Adafruit IO.

Line 116 in adafruit_io/adafruit_io.py

DJDevon3 commented 7 months ago

Have an idea. Instead of using supervisor.reload() trying io.reconnect with a continue now. Didn't know io.reconnect() was a thing until just now.

        except (AdafruitIO_MQTTError) as ex:
            print("| | ❌ AdafruitIO_MQTTError: \n", ex)
            #traceback.print_exception(ex, ex, ex.__traceback__)
            io.reconnect()
            continue
DJDevon3 commented 7 months ago

Here's a look at my adafruitIO live feed.

Capture

When it fails there are no errors on AdafruitIO's side. Everything looks normal from their perspective. There is no connection on their side which does correlate with a failure to send or failure to connect type of issue. I see nothing abnormal from the AdafruitIO dashboard side.

DJDevon3 commented 7 months ago
Traceback (most recent call last):
  File "code.py", line 586, in <module>
  File "adafruit_io/adafruit_io.py", line 107, in reconnect
AdafruitIO_MQTTError: MQTT Error: Unable to reconnect to Adafruit IO.

was worth a shot. something is very wrong if I have to go to these lengths.

DJDevon3 commented 7 months ago

Tried to remove adafruit_io library from the equation and use only adafruit_minimqtt with the same result.

===============================
Board Uptime:  2 hours
Temp:  76.1119
Actual Temp: 74.7
| Connecting to WiFi...
| ✅ WiFi!
| | Attempting to GET Weather!
| | Account within Request Limit
| | ✅ Connected to OpenWeatherMap
| | | Sunrise: 06:51
| | | Sunset: 17:26
| | | Temp: 68.22
| | | Pressure: 1017
| | | Humidity: 74
| | | Weather Type: Clear
| | | Wind Speed: 8.05
| | | Timestamp: 2023/11/27 17:46
| | ✂️ Disconnected from OpenWeatherMap
| | ✅ Connected to MQTT Broker!
| | | Published DJDevon3/feeds/BME280-Unbiased None
| | | Published DJDevon3/feeds/BME280-RealTemp None
| | | Published DJDevon3/feeds/BME280-Pressure None
| | | Published DJDevon3/feeds/BME280-Humidity None
| | | Published DJDevon3/feeds/BME280-Altitude None
| | ✂️ Disconnected from MQTT Broker
| ✂️ Disconnected from Wifi
Next Update:  15 minutes

There were 6 successful updates and then failure

===============================
Board Uptime:  2 hours
Temp:  75.9843
Actual Temp: 74.6
| Connecting to WiFi...
| ✅ WiFi!
| | Attempting to GET Weather!
| | Account within Request Limit
| | ✅ Connected to OpenWeatherMap
| | | Sunrise: 06:51
| | | Sunset: 17:26
| | | Temp: 66.11
| | | Pressure: 1017
| | | Humidity: 78
| | | Weather Type: Clear
| | | Wind Speed: 5.01
| | | Timestamp: 2023/11/27 18:02
| | ✂️ Disconnected from OpenWeatherMap
| | ❌ Failed to connect, retrying
 Out of sockets
| | Attempting to GET Weather!
| | Account within Request Limit
| | ✅ Connected to OpenWeatherMap
| | | Sunrise: 06:51
| | | Sunset: 17:26
| | | Temp: 65.77
| | | Pressure: 1016
| | | Humidity: 79
| | | Weather Type: Clear
| | | Wind Speed: 5.01
| | | Timestamp: 2023/11/27 18:06
| | ✂️ Disconnected from OpenWeatherMap
| | ❌ Failed to connect, retrying
 Out of sockets
| | Attempting to GET Weather!
| | Account within Request Limit
| | ✅ Connected to OpenWeatherMap
| | | Sunrise: 06:51
| | | Sunset: 17:26
| | | Temp: 65.77
| | | Pressure: 1016
| | | Humidity: 79
| | | Weather Type: Clear
| | | Wind Speed: 3.0
| | | Timestamp: 2023/11/27 18:10
| | ✂️ Disconnected from OpenWeatherMap
| | ❌ Failed to connect, retrying
 Out of sockets

Once it fails it will not reconnect without a soft reboot via supervisor.reload or a hard reset.

Sharing CIRCUITPY contents (except settings.toml) Contents_of_D.zip

DJDevon3 commented 7 months ago

I have a theory that it's looping connect_retries amount of times each connection, successfully, and then fails. Some kind of misplaced loop counter issue? Or it's not resetting connect_retries back to 0 if there was a successful connection made so it's 5 retries per lifecycle of the script and then it permanently fails. That could account for the seemingly random amount of successful connections & disconnects before a permanent fail.

socket_timeout: int = 1,
connect_retries: int = 5,

Line 193 of adafruit_minimqtt.py perhaps?

Which seems to point to _recompute_reconnect_backoff

Though having an explicit error that states it's out of sockets also confirms my suspicions that it's actually running out of sockets.

DJDevon3 commented 7 months ago

Anecdata & Justin have been a tremendous help in Discord going over different iterations to force failures. Think we're getting close. It's definitely some type of failure to close sockets properly issue. The library should be closing the sockets and it's not happening. Upon bisecting more figured out that adafruit_minimqtt will correctly run infinitely. This is not an adafruit_minimqtt or adafruit_io library issue.

OpenWeatherMap only using adafruit_requests will eventually fail with an out of sockets error. This one has taken a long road to circle back and is looking like a requests related failure to close socket issue after all.

Anecdata figured out you get less sockets if using TLS which is how it went from failing after about 10 attempts to 5. TLS uses 2 sockets per connection. A happy discovery as it makes everything fail faster. Issue exists for both port 1883 insecure or 8883 secure (along with is_ssl=True).

DJDevon3 commented 7 months ago

Ran it overnight with great success by adding:

owm_request.close()
requests._free_sockets()
pass

to manually close sockets correctly. why this works I don't know. Anecdata would have to chime in as that's deeper than I go.

requests._default_session._free_sockets() did not work and threw errors. Can consider this a potential temporary work around if anyone is encountering an Out of Sockets issue with 8.2.x I think we were able to narrow it down with enough context to give someone a good shot at fixing this.

DJDevon3 commented 7 months ago

Moved

requests = adafruit_requests.Session(pool, ssl.create_default_context())

to outside the main loop and removed

owm_request.close()
requests._free_sockets()

and it works as intended. I must have moved it inside the loop at some point in time and that's what broke everything. Hard lesson learned. Always keep adafruit_requests.Session() outside of the main loop just like socketpool.

This one is indeed a user and syntax error.

DJDevon3 commented 7 months ago

Been running for half a day with over 30 attempts without a single failure. Closing.