NRCHKB / node-red-contrib-unifi-os

Nodes to access UniFi data using endpoints and websockets
Apache License 2.0
28 stars 1 forks source link

NR is crashing after update to NR4.0.0 and contrib unifi 0S 1.0.0 #72

Closed canedje closed 3 months ago

canedje commented 5 months ago

Nr is crashing emidiatly after start caused bij the unifi OS pallet. I restarted several times giveing the same error crashing the NR docker. I removed the unifi OS pallet and now NR keeps running NR docker version: 4.0.0 node-red-contrib-unifi-os version: 1.0.0 Error LOG:

21 Jun 15:07:00 - [red] Uncaught Exception:
21 Jun 15:07:00 - [error] TypeError: Cannot read properties of undefined (reading 'includes')
    at init.<anonymous> (/data/node_modules/node-red-contrib-unifi-os/build/nodes/Protect.js:214:35)
    at Generator.next (<anonymous>)
    at /data/node_modules/node-red-contrib-unifi-os/build/nodes/Protect.js:31:71
    at new Promise (<anonymous>)
    at __awaiter (/data/node_modules/node-red-contrib-unifi-os/build/nodes/Protect.js:27:12)
    at Object.handleUpdate [as dataCallback] (/data/node_modules/node-red-contrib-unifi-os/build/nodes/Protect.js:139:40)
    at /data/node_modules/node-red-contrib-unifi-os/build/SharedProtectWebSocket.js:126:42
    at Array.forEach (<anonymous>)
    at WebSocket.<anonymous> (/data/node_modules/node-red-contrib-unifi-os/build/SharedProtectWebSocket.js:124:57)
    at WebSocket.emit (node:events:519:28)
    at Receiver.receiverOnMessage (/data/node_modules/ws/lib/websocket.js:1209:20)
    at Receiver.emit (node:events:519:28)
    at Receiver.dataMessage (/data/node_modules/ws/lib/receiver.js:567:14)
    at Receiver.getData (/data/node_modules/ws/lib/receiver.js:496:10)
    at Receiver.startLoop (/data/node_modules/ws/lib/receiver.js:167:16)
    at Receiver._write (/data/node_modules/ws/lib/receiver.js:94:10)
    at writeOrBuffer (node:internal/streams/writable:564:12)
    at _write (node:internal/streams/writable:493:10)
    at Writable.write (node:internal/streams/writable:502:10)
    at TLSSocket.socketOnData (/data/node_modules/ws/lib/websocket.js:1303:35)
    at TLSSocket.emit (node:events:519:28)
    at addChunk (node:internal/streams/readable:559:12)
Shaquu commented 5 months ago

Looking into it

Shaquu commented 5 months ago

node-red/node-red/issues/4780

marcus-j-davies commented 5 months ago

v1.0.1 released.

This address a missing config parameter that was introduced in v1 opening the Protect Node/saving/deploy will fix it.

but v1.0.1 stops it for future users updating to the v1 release

canedje commented 5 months ago

Still crashing. Not as much as before. Now once a day:

23 Jun 21:00:26 - [error] [change:5bb589225a0ed484] Invalid JSONata expression: Unable to cast value to a number: "-3;;f;l"
23 Jun 21:21:08 - [error] [api-call-service:Notify all] InputError: Invalid JSON: {"title":"LICHT:","message":"Licht woonkamer aangezet door lichtsensor. 
 waarde &#x3D; 12","data":{"ttl":0,"priority":"high"}}
23 Jun 21:25:26 - [error] [change:5bb589225a0ed484] Invalid JSONata expression: Argument 1 of function "number" does not match function signature
23 Jun 22:55:26 - [error] [change:5bb589225a0ed484] Invalid JSONata expression: Argument 1 of function "number" does not match function signature
23 Jun 23:53:23 - [red] Uncaught Exception:
23 Jun 23:53:23 - [error] Error: WebSocket is not open: readyState 0 (CONNECTING)
    at WebSocket.ping (/data/node_modules/ws/lib/websocket.js:361:13)
    at SharedProtectWebSocket.<anonymous> (/data/node_modules/node-red-contrib-unifi-os/build/SharedProtectWebSocket.js:74:72)
    at Generator.next (<anonymous>)
    at fulfilled (/data/node_modules/node-red-contrib-unifi-os/build/SharedProtectWebSocket.js:5:58)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:540:9)
    at process.processTimers (node:internal/timers:514:7)
24 Jun 05:56:52 - [info] 
marcus-j-davies commented 5 months ago

Strange.

watchDog that ping occurs re-starts its self after a pong & watchDog only occurs after a connection some race condition I have missed I guess.

  private HEARTBEAT_INTERVAL = 10000
  private RECONNECT_TIMEOUT = 15000

  private async watchDog(): Promise<void> {
        setTimeout(async () => {
            await this.updateStatusForNodes(SocketStatus.HEARTBEAT)
            this.ws?.ping()

            const reconnectTimer = setTimeout(async () => {
                await this.updateStatusForNodes(
                    SocketStatus.RECOVERING_CONNECTION
                )
                this.disconnect()
                this.connect()
            }, this.RECONNECT_TIMEOUT)

            this.ws?.once('pong', async () => {
                clearTimeout(reconnectTimer)
                await this.updateStatusForNodes(SocketStatus.CONNECTED)
                this.watchDog()
            })
        }, this.HEARTBEAT_INTERVAL)
    }

RECONNECT_TIMEOUT might have occurred before the pong was received, and ping/pong should be near instant (or at least 1-2s whilst there is an active connection

i.e pong took >15s (that should not be the case), and results in a new watchDog whilst the other was a ticking time bomb that fired at RECONNECT_TIMEOUT

Will try and review it as soon as I can

marcus-j-davies commented 5 months ago

OK,

I'm going to unsubscribe to a pong during a re-connect. To me, it seems for some reason, the pong was received a lot later than should have been.

so the re-connect started (Connecting) and during this - the late pong was received (queuing another ping) whilst still connecting, because the original pong took it's time to reach us (when it should be only 2-5 seconds)

its all I have for now - its a bit weird

How the watchDog works

Screenshot 2024-06-24 at 21 18 24

canedje commented 5 months ago

Thanks for the effort

pakerfeldt commented 4 months ago

I had the same issue. Will try to upgrade this library. Thanks for the efforts! edit/ noticed this hasn't been released yet.

crxporter commented 4 months ago

It's been released as beta. Navigate to nodered install location (usually ~/.node-red on a pi)

Then run:

npm i node-red-contrib-unifi-os@1.1.0-beta.1

Then restart nodered. I started testing last night, it's working pretty well on nodered 4.0.2 running on my pi.

pakerfeldt commented 4 months ago

Nice, trying it out now.

pakerfeldt commented 4 months ago

FWIW, the plugin is still regularly crashing my node-red.

17 Jul 17:05:17 - [red] Uncaught Exception:
17 Jul 17:05:17 - [error] TypeError: Cannot read properties of undefined (reading 'toString')
    at Unzip.cb (/usr/src/node-red/node_modules/node-red-contrib-unifi-os/build/nodes/Request.js:93:49)
    at Unzip.zlibBufferOnError (node:zlib:146:8)
    at Unzip.emit (node:events:517:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)
marcus-j-davies commented 4 months ago

That's related to an attempt to parse Gzip responses. Clearly, not working...

We have already decided to not attempt it, will remove it in Beta 2

marcus-j-davies commented 4 months ago

BETA 2 Available npm i node-red-contrib-unifi-os@1.1.0-beta.2

Supergiovane commented 4 months ago

Hi Marcus I’m on node-red-contrib-unifi-os@1.1.0-beta.2 I’m very sorry, NR crashed again after about 24 hours from start.

IMG_8322

marcus-j-davies commented 4 months ago

Oh For f... S...

That seems out of the contrib code, during web socket handshake its self (the ws module) is this reproducible?

I might have to disable the watchDog - clearly Unifi doesn't like it. https://github.com/NRCHKB/node-red-contrib-unifi-os/blob/cdbca0b8282d4e3c37eef801066b42cf9eebd656/src/SharedProtectWebSocket.ts#L189

I can't look at this for the next few days, so if anyone wants to chip in?

We are catching ws error's - so I'm clueless currently. https://github.com/NRCHKB/node-red-contrib-unifi-os/blob/cdbca0b8282d4e3c37eef801066b42cf9eebd656/src/SharedProtectWebSocket.ts#L177

Supergiovane commented 4 months ago

This doesn't make sense. The code seems right. I'm updating the ws to 8.18.0. I'll let you know.

Supergiovane commented 4 months ago

Hi, made some progress. Removing ws.close()(and leaving only ws.terminate()) has improved the uptime to 24 hours. Then, the MaxListenersExceededWarning starts to appear and soon after, the system crashes. The ws.close is an async function, so one would have to wait for the server's response or timeout. The ws.terminate instantly terminates the connection instead. Now i've disabled the this.watchDog() and is up and running for more than 24 hours. Let's see, but i'm thinking it has to do with timing set in the setIntervals functions, maybe they're too short...

marcus-j-davies commented 4 months ago

Thanks @Supergiovane , the Watch Dog seems to be the problem here it seems. Without it, the nodes will need to be restarted, if/whenever Protect decides to crash, or reboots (by the user)

@crxporter Would we truly miss recovery? I need to put more thought into the watch dog if so

Supergiovane commented 4 months ago

Hi guys, disabling the this.watchDog, solved the issue. No more NR crashes. As soon as i return home from holiday, i'll take a look at the watchdog.

crxporter commented 4 months ago

Would we truly miss recovery? I need to put more thought into the watch dog if so

Generally my pi restarts less often than a unifi update - personally I would appreciate a smooth recovery after protect (or network) software update / reboot / other...

pakerfeldt commented 3 months ago

Is there a new beta I can use where it doesn't crash?

marcus-j-davies commented 3 months ago

Not yet...

Im just too busy with other things at the moment. as noted by @Supergiovane - commenting out the watchDog stops any crashing.

https://github.com/NRCHKB/node-red-contrib-unifi-os/blob/cdbca0b8282d4e3c37eef801066b42cf9eebd656/src/SharedProtectWebSocket.ts#L189

Supergiovane commented 3 months ago

Just make sure to comment out the .js file, not the .ts one.

pakerfeldt commented 3 months ago

Understood, thanks. I made the change in the built package inside my docker container for now.

Supergiovane commented 3 months ago

Hi pakerfeldt, Can you try this file? Just decompress the zip and put the .js into the build folder of node-red-contrib-unifi-os. SharedProtectWebSocket.js.zip

This should fix the issue. Please let us know. Thanks.

pakerfeldt commented 3 months ago

Hi pakerfeldt, Can you try this file? Just decompress the zip and put the .js into the build folder of node-red-contrib-unifi-os. SharedProtectWebSocket.js.zip

This should fix the issue. Please let us know. Thanks.

Sure, I'm running it as of now. Will monitor and see how it works. 👍

Supergiovane commented 3 months ago

Hi @pakerfeldt, please use this file yet. I've done an error handler so errors shouldn't crash NR. SharedProtectWebSocket.js.zip

pakerfeldt commented 3 months ago

Sure! 👍

pakerfeldt commented 3 months ago

Crashed earlier ...

8 Aug 16:49:31 - [red] Uncaught Exception:
8 Aug 16:49:31 - [error] Error: Unexpected server response: 500
    at ClientRequest.<anonymous> (/usr/src/node-red/node_modules/node-red-contrib-unifi-os/node_modules/ws/lib/websocket.js:913:7)
    at ClientRequest.emit (node:events:517:28)
    at HTTPParser.parserOnIncomingClient (node:_http_client:700:27)
    at HTTPParser.parserOnHeadersComplete (node:_http_common:119:17)
    at TLSSocket.socketOnData (node:_http_client:541:22)
    at TLSSocket.emit (node:events:517:28)
    at addChunk (node:internal/streams/readable:368:12)
    at readableAddChunk (node:internal/streams/readable:341:9)
    at TLSSocket.Readable.push (node:internal/streams/readable:278:10)
    at TLSWrap.onStreamRead (node:internal/stream_base_commons:190:23)
8 Aug 16:49:33 - [info]

Welcome to Node-RED
===================

8 Aug 16:49:33 - [info] Node-RED version: v3.1.11
8 Aug 16:49:33 - [info] Node.js  version: v18.20.3
8 Aug 16:49:33 - [info] Linux 5.15.0-117-generic x64 LE
...
Supergiovane commented 3 months ago

Hi, no, it shouldn’t crash. Mine is running smoothly. Tomorrow i’ll check wether i’ve zipped the right file!

Supergiovane commented 3 months ago

Hi @pakerfeldt, please try this: SharedProtectWebSocket.js.zip Remember to put it into the "build" folder of your unifi-os node. I've added some logs, to better catch the error, in case it shows up again. Thanks!

pakerfeldt commented 3 months ago

Yeye, it's in the build folder, more specifically /usr/src/node-red/node_modules/node-red-contrib-unifi-os/build inside my docker container.

050e37f2be05:~/node_modules/node-red-contrib-unifi-os/build$ ls -alFh
total 88K
drwxr-xr-x    1 node-red node-red    4.0K Aug  9 09:09 ./
drwxr-xr-x    1 node-red node-red    4.0K Jul 31 08:54 ../
-rw-r--r--    1 node-red node-red     618 Jul 31 08:54 Endpoints.js
-rw-r--r--    1 node-red node-red    6.7K Jul 31 08:54 EventModels.js
-rw-r--r--    1 node-red node-red   11.9K Aug  9 09:09 SharedProtectWebSocket.js
-rw-r--r--    1 node-red node-red   10.3K Aug  5 20:24 SharedProtectWebSocket.js.old
-rw-r--r--    1 node-red node-red   10.3K Aug  7 14:06 SharedProtectWebSocket.js.old2
-rw-r--r--    1 node-red node-red   10.7K Aug  8 12:12 SharedProtectWebSocket.js.old3
drwxr-xr-x    2 node-red node-red    4.0K Jul 31 08:54 lib/
drwxr-xr-x    3 node-red node-red    4.0K Jul 31 08:54 nodes/
drwxr-xr-x    2 node-red node-red    4.0K Jul 31 08:54 types/

Trying your latest revision as of now.

Seems to be logging stuff.

unifi-os: entered reconnectTimer
unifi-os: private disconnect()
unifi-os: private connect()
unifi-os: private disconnect()
unifi-os: entered heartBeatTimer
unifi-os: entered reconnectTimer
unifi-os: private disconnect()
unifi-os: private connect()
unifi-os: private disconnect()
unifi-os: entered heartBeatTimer
unifi-os: entered reconnectTimer
unifi-os: private disconnect()
unifi-os: private connect()
unifi-os: private disconnect()
unifi-os: entered heartBeatTimer
unifi-os: entered reconnectTimer
unifi-os: private disconnect()
unifi-os: private connect()
unifi-os: private disconnect()
Supergiovane commented 3 months ago

Sorry, please don't shot me. Use this updated file: SharedProtectWebSocket.js.zip

pakerfeldt commented 3 months ago

No worries. Swapping to a new version is swift. Done!

marcus-j-davies commented 3 months ago

@Supergiovane

Thank you for your contribution here 🙏

if you can stop the crashing whilst keeping the watchdog intact - please put up a PR against Dev.

I'm away on my hols currently - so will become more interactive when back.

Supergiovane commented 3 months ago

Hi @marcus-j-davies Sure, will do. Thank you to you for your contrib!

pakerfeldt commented 3 months ago

I also want to extend my gratitude to this contrib. Great work! ⭐

Supergiovane commented 3 months ago

Hi PakerFeldt, It’s running good now, isn’t? Mine is rock solid since 5 days.

pakerfeldt commented 3 months ago

I was just about to write. Mine has also been stable since the last version I got. Not a single crash 👍

Supergiovane commented 3 months ago

Good. Time to make a PR. Thank you for your help.

marcus-j-davies commented 3 months ago

Closing as now Beta 3 is Released