FlowFuse / device-agent

An agent to run FlowFuse managed instances of Node-RED on devices
Apache License 2.0
15 stars 8 forks source link

Device does not reconnect editor WS tunnel if Forge app restarted #222

Closed hardillb closed 6 months ago

hardillb commented 7 months ago

Current Behavior

While doing local development (but all any delay in prod) if a device has an open editor tunnel connection and the forge app is restarted (e.g. by npm run serv) the device agent tries to reconnect after 500ms and fails with connection refused (because the forge app has not started yet). It then never tries again (the code comments imply it should try again after 1.5 seconds and then after 4.5 ....)

[AGENT] 16/01/2024 14:47:41 [info] Enabling remote editor access
[AGENT] 16/01/2024 14:47:41 [info] Connecting editor tunnel to ws://localhost:3000/api/v1/devices/K6jN3Z4ybv/editor/comms/ffde_K6jN3Z4ybv_CZq8xfwZCOb24pBrcD5MLA
[AGENT] 16/01/2024 14:47:41 [info] Editor tunnel connected
[AGENT] 16/01/2024 14:47:43 [debug] Sending status message
[AGENT] 16/01/2024 14:47:43 [debug] Interval Timer Executing
[AGENT] 16/01/2024 14:47:43 [debug] Sending check-in message
[AGENT] 16/01/2024 14:47:49 [info] Editor tunnel closed code=1005 reason=
[AGENT] 16/01/2024 14:47:51 [info] Connecting editor tunnel to ws://localhost:3000/api/v1/devices/K6jN3Z4ybv/editor/comms/ffde_K6jN3Z4ybv_CZq8xfwZCOb24pBrcD5MLA
[AGENT] 16/01/2024 14:47:51 [warn] Editor tunnel error: Error: connect ECONNREFUSED ::1:3000
socket.error Error: connect ECONNREFUSED ::1:3000
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '::1',
  port: 3000
}
[AGENT] 16/01/2024 14:47:51 [info] Editor tunnel closed
[AGENT] 16/01/2024 14:48:19 [debug] Interval Timer Executing
[AGENT] 16/01/2024 14:48:19 [debug] Sending check-in message

Expected Behavior

The reconnect logic to step up the back off timer until it hits a limit or reconnects.

Steps To Reproduce

Environment

knolleary commented 7 months ago

Currently the platform generates a non-persistent access token for use by the device editor. When the platform is restarted, those non-persitent tokens are discarded, meaning the device cannot reconnect.

We need to transition to using persistent access tokens in the platform so that the devices can reconnect after a restart. This is also a requirement as part of the scalability work.

Steve-Mcl commented 5 months ago

Verified on staging. @ppawlowski restarted staging while i had a (local) device connected to the staging platform. Team: Steve on Staging Application: Demo Application Device: Device 1 (LlxGYqVarE)

I refreshed the device before, during and after.