FlowFuse / device-agent

An agent to run FlowFuse managed instances of Node-RED on devices
Apache License 2.0
16 stars 8 forks source link

WebSocket is not open: readyState 0 #183

Closed knolleary closed 9 months ago

knolleary commented 1 year ago

Current Behavior

Reported by a user. Happened once, then reconnected without further issue.

Expected Behavior

No response

Steps To Reproduce

Current unknown

Environment

Steve-Mcl commented 9 months ago

Witnessed very similar issue today doing verifications between device-staging

[AGENT] 09/01/2024 15:21:14 [info] Editor tunnel closed code=1005 reason=
[AGENT] 09/01/2024 15:21:14 [info] Connecting editor tunnel to wss://staging/api/v1/devices/abcdefg/editor/comms/ffde_EjxwLzQyM9_2Pa5ZSc1ujoUIeOK7ysqVw
C:\Users\Stephen\repos\github\flowfuse\dev-env\packages\device-agent\node_modules\ws\lib\websocket.js:442
      throw new Error('WebSocket is not open: readyState 0 (CONNECTING)');
      ^

Error: WebSocket is not open: readyState 0 (CONNECTING)
    at WebSocket.send (C:\Users\Stephen\repos\github\flowfuse\dev-env\packages\device-agent\node_modules\ws\lib\websocket.js:442:13)
    at WebSocket.<anonymous> (C:\Users\Stephen\repos\github\flowfuse\dev-env\packages\device-agent\lib\editor\tunnel.js:105:46)
    at WebSocket.emit (node:events:527:28)
    at Receiver.receiverOnMessage (C:\Users\Stephen\repos\github\flowfuse\dev-env\packages\device-agent\node_modules\ws\lib\websocket.js:1184:20)
    at Receiver.emit (node:events:527:28)
    at Receiver.dataMessage (C:\Users\Stephen\repos\github\flowfuse\dev-env\packages\device-agent\node_modules\ws\lib\receiver.js:541:14)
    at Receiver.getData (C:\Users\Stephen\repos\github\flowfuse\dev-env\packages\device-agent\node_modules\ws\lib\receiver.js:459:17)
    at Receiver.startLoop (C:\Users\Stephen\repos\github\flowfuse\dev-env\packages\device-agent\node_modules\ws\lib\receiver.js:158:22)
    at Receiver._write (C:\Users\Stephen\repos\github\flowfuse\dev-env\packages\device-agent\node_modules\ws\lib\receiver.js:84:10)
    at writeOrBuffer (node:internal/streams/writable:389:12)

While not identical it backs up the notion we need additional error handling in the agents EditorTunnel.socket callbacks

Steve-Mcl commented 9 months ago

I have confirmed that when a device has an open tunnel and the core application is updated/restarted, this can occur.

Steve-Mcl commented 9 months ago

I can freely recreate this by killing the FF core (no sigint & thus no clean up / no WS close event).

Containment PR incoming

Steve-Mcl commented 9 months ago

verified against staging.

3x devices tunnel connected, staging restarted

NOTE: the tunnel is still closed (due to core shutting down) but the devices survived/no longer crashed with readyState error.

hardillb commented 9 months ago

NOTE: the tunnel is still closed (due to core shutting down) but the devices survived/no longer crashed with readyState error.

@Steve-Mcl I opened an issue yesterday after testing this locally as it looks like the device agent is meant to retry with a back off, but only tries once (with only 0.5 seconds delay). https://github.com/FlowFuse/device-agent/issues/222

Steve-Mcl commented 9 months ago

@hardillb Yeah, i seen this (and why i wrote that comment as a nod to your finding) however, I think that particular retry logic is for recovering minimal network drop outs Ben.

We have not written any code to recover a core restart (as new tokens would need to be generated and passed back to the agent)