ambient-weather / api-docs

AmbientWeather.net API Documentation
64 stars 42 forks source link

Realtime/Websocket API sometimes fails to return PONG? #30

Closed bachya closed 2 years ago

bachya commented 3 years ago

Home Assistant users of the Ambient Weather integration are reporting that the underlying library (python-engineio, as part of aioambient) sometimes returns packet queue is empty, aborting error messages.

Some research on what this means:

So packet queue is empty, aborting is a product of the underlying python-engineio package and is triggered when empty payloads are sent and I believe when the ongoing PING / PONG between server and client is missed on the server-side.

As such, it looks like the main problem you're dealing with is the client is disconnecting because the server isn't able to PONG back to the client and the client eventually gives up and disconnects.

Not sure how to tie things together, but wanted to share in case there's something that needs tuning in the realtime API.

heyitsyang commented 3 years ago

Here's a related post from a python forum: https://stackoverflow.com/questions/66441954/socketio-packet-queue-is-empty-aborting-error-when-sending-message-from-serve

autumnwalker commented 3 years ago

Seeing the same issue with my Ambient integration as well. Ambient - any updates?

tankdeer commented 2 years ago

Any updates? This is very much still an issue

knwpsk commented 2 years ago

bump. I'm seeing it, too.

willp commented 2 years ago

I'm seeing it as well using python3 and socketio with the websockets transport.

Here's a workaround that "fixes" the timeouts that cause the packet queue is empty, aborting error reported by engineio. If you are creating your own socketio.Client() and it's named sio then here's what you can do in the connect() event handler (before you issue a subscribe request) to increase the timeouts for both the websocket and engineio underlying objects:

 print('HACK: Setting ping timeout up 3x from: timeout=', sio.eio.ping_timeout, ' and interval=', sio.eio.ping_interval)

 sio.eio.ping_interval *= 3.0  

 sio.eio.ping_timeout *= 3.0 

 sio.eio.ws.settimeout(sio.eio.ping_interval + sio.eio.ping_timeout)

 print('HACK: New values are: timeout=', sio.eio.ping_timeout, ' and interval=', sio.eio.ping_interval)

(yes, this is an ugly runtime monkey patch for the timeouts)

These timeouts are client-enforced (server advised, but not server-enforced), so my API client no longer closes the connection due to a missed ping/pong, as far as I understand the websocket protocol.

Unfortunately, the underlying connection from the server seems very unstable as the server drops the connection after about 2 minutes. The auto-reconnect logic built into socketio+websockets works fine to paper over it, but it's not a good state to be in. I think maybe the server is incorrectly implementing timeouts, or there's some sort of misconfiguration that is dropping long-lived websockets connections way too early. It wouldn't surprise me if the server has a timeout parameter that expects milliseconds but which is configured as if it needs seconds for units. But I have no idea.

This can't be good for server-load, especially since the reconnect and subscribe request produces a new session ID every time.

owise1 commented 2 years ago

sorry for the radio silence on this. we'll look into it

owise1 commented 2 years ago

I've set up a new domain with an upgraded version of socket.io (v4) and an extended ping timeout. Would you mind trying this domain: rt2.ambientweather.net and letting me know if the problem persists?

willp commented 2 years ago

I'm testing now.... Already I'm seeing new (good) behavior on the test URL, and disabled my timeout hack. <... time elapsed ...>

Good news! I'm seeing server-sent PING packets now (didn't before) at exactly 25 second intervals, and my websocket client is automatically responding with PONG replies.

So far zero connection drops, and the connection has stayed up for 12+ minutes, receiving new data each minute. :-D

Looks great here. Thank you for taking care of this. I appreciate your efforts and this service. I hope the servers are happier too after you can get it into production :-)

willp commented 2 years ago

The test connection is still good 1 hour later, no disconnects and data looks perfect. I'll stop my test client now. I'd call it pretty solid!

owise1 commented 2 years ago

Thanks @willp for testing this. I'm going to scale up this process to a few more nodes to make sure everything continues to be stable.

As far as production goes, because of our internal structure it was necessary to tease out the socket functionality to fix this issue. That means that rt2.ambientweather.net is going to become the production endpoint for the realtime api. The rest api will remain at rt.ambientweather.net. I realize this adds some unwelcome complexity, but so it goes. As soon as we can confirm everything's all good we'll update the documentation accordingly.

bachya commented 2 years ago

Thanks for your work, @owise1! For aioambient, I've been using api.ambientweather.net for both REST and websocket APIs; I assume that once you give the go-ahead from ^^^, I should switch ASAP?

owise1 commented 2 years ago

@bachya We're still supporting api.ambientweather.net but it's probably better to switch your REST endpoint to rt.ambientweather.net. And then yeah, I'd say test out rt2.ambientweather.net for real time and if it seems better make the switch.