MintyTrebor / moonNode

A set of node-red nodes to interface with the Moonraker API for Klipper
18 stars 1 forks source link

multiple websocket connection errors #3

Closed jonferreira closed 3 years ago

jonferreira commented 3 years ago

Hi.

As much as I love your node I think there's something inherently wrong with how it uses websocket connections.

24H after deploying the node my klipper host was down. Upon investigation it was a multiple of errors caused by too many files open caused by too many websocket connections which in turns flooded the logs with errors and filled up 16Gb SD card.

moonraker.log:

Complete error: 2021-08-16 20:34:56,103 [base_events.py:default_exception_handler()] - Exception in callback BaseAsyncIOLoop._handle_events(11, 1) handle: <Handle BaseAsyncIOLoop._handle_events(11, 1)> Traceback (most recent call last): File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/home/pi/moonraker-env/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events handler_func(fileobj, events) File "/home/pi/moonraker-env/lib/python3.7/site-packages/tornado/netutil.py", line 266, in accept_handler connection, address = sock.accept() File "/usr/lib/python3.7/socket.py", line 212, in accept fd, addr = self._accept() OSError: [Errno 24] Too many open files

I've also noticed that when stopping moonraker I get at least 3 duzen entries of [websockets.py:remove_websocket()] - Websocket Removed: 2764687344 (id is different every time).

Here's a small snippet of what I get:

2021-08-17 10:31:22,248 [app.py:log_request()] - 101 GET /websocket?token=B6TJPPMURULFXF37EZ2HL4SOUCBGAXRR (192.168.1.3) [_API_KEY_USER_] 7.68ms 2021-08-17 10:31:22,249 [websockets.py:add_websocket()] - New Websocket Added: 2802658608 2021-08-17 10:31:22,252 [app.py:log_request()] - 101 GET /websocket?token=VKFDPMVCL2QTLSUPI75LKJWVGVYIH5K6 (192.168.1.3) [_API_KEY_USER_] 8.35ms 2021-08-17 10:31:22,252 [websockets.py:add_websocket()] - New Websocket Added: 2791749808 2021-08-17 10:31:22,271 [app.py:log_request()] - 101 GET /websocket?token=NBKV6TTAZVQCZQKUC4LPGJQ6NB7UVVTR (192.168.1.3) [_API_KEY_USER_] 10.95ms 2021-08-17 10:31:22,272 [websockets.py:add_websocket()] - New Websocket Added: 2791751664 2021-08-17 10:31:22,275 [app.py:log_request()] - 101 GET /websocket?token=CE27IU2NT4L2DPMB3LZHB2ZMDGPC6NDK (192.168.1.3) [_API_KEY_USER_] 11.43ms 2021-08-17 10:31:22,275 [websockets.py:add_websocket()] - New Websocket Added: 2791747984 2021-08-17 10:31:22,279 [app.py:log_request()] - 101 GET /websocket?token=T33Y5BGU3LEYOVF5ACKBJ7MILCF7N3ZL (192.168.1.3) [_API_KEY_USER_] 11.39ms 2021-08-17 10:31:22,279 [websockets.py:add_websocket()] - New Websocket Added: 2802542256 2021-08-17 10:31:22,366 [app.py:log_request()] - 101 GET /websocket?token=U46JGGZWKXO2L4KGEP2WEVLHIZINESRY (192.168.1.3) [_API_KEY_USER_] 29.05ms 2021-08-17 10:31:22,366 [websockets.py:add_websocket()] - New Websocket Added: 2802765104 2021-08-17 10:31:22,368 [app.py:log_request()] - 101 GET /websocket?token=VN3MFH3ZK5U4BBTO2AKA3JMSACV5S462 (192.168.1.3) [_API_KEY_USER_] 29.15ms 2021-08-17 10:31:22,368 [websockets.py:add_websocket()] - New Websocket Added: 2791681360 2021-08-17 10:31:22,407 [app.py:log_request()] - 101 GET /websocket?token=3IAEFW5BWUCG55UMYSSXF4LXA475RF53 (192.168.1.3) [_API_KEY_USER_] 29.22ms 2021-08-17 10:31:22,407 [websockets.py:add_websocket()] - New Websocket Added: 2791681936 2021-08-17 10:31:22,409 [app.py:log_request()] - 101 GET /websocket?token=LQZT7257QG3NHZS7ORXEKGBAZ2ALHWVJ (192.168.1.3) [_API_KEY_USER_] 29.42ms 2021-08-17 10:31:22,409 [websockets.py:add_websocket()] - New Websocket Added: 2812511536 2021-08-17 10:31:22,410 [app.py:log_request()] - 101 GET /websocket?token=UPO6PEUFT4IJY2AH4LK6RBMFF2VATHWM (192.168.1.3) [_API_KEY_USER_] 29.63ms 2021-08-17 10:31:22,410 [websockets.py:add_websocket()] - New Websocket Added: 2802764752 2021-08-17 10:31:22,412 [app.py:log_request()] - 101 GET /websocket?token=X2NSO3KXW6BLDUYT54CQUR6V3BHPG37D (192.168.1.3) [_API_KEY_USER_] 30.09ms 2021-08-17 10:31:22,412 [websockets.py:add_websocket()] - New Websocket Added: 2792079568 2021-08-17 10:31:22,413 [app.py:log_request()] - 101 GET /websocket?token=LND6DNRP3XO7GJMLCUHKCQWP3IDSD2TC (192.168.1.3) [_API_KEY_USER_] 30.34ms 2021-08-17 10:31:22,413 [websockets.py:add_websocket()] - New Websocket Added: 2802430768 2021-08-17 10:31:22,415 [app.py:log_request()] - 101 GET /websocket?token=YJBO7FJZDAZ3LWHYFURAE3LRYD26JBLU (192.168.1.3) [_API_KEY_USER_] 30.78ms 2021-08-17 10:31:22,415 [websockets.py:add_websocket()] - New Websocket Added: 2812555760 2021-08-17 10:31:22,416 [app.py:log_request()] - 101 GET /websocket?token=G7OE25E3LOLI3VSAIZJRF3RVFAC2ARRW (192.168.1.3) [_API_KEY_USER_] 31.01ms 2021-08-17 10:31:22,416 [websockets.py:add_websocket()] - New Websocket Added: 2802392848 2021-08-17 10:31:22,417 [app.py:log_request()] - 101 GET /websocket?token=MNG6XNRAYJMM6S6K5N3VUOB2D7OVGQ4S (192.168.1.3) [_API_KEY_USER_] 31.50ms 2021-08-17 10:31:22,418 [websockets.py:add_websocket()] - New Websocket Added: 2802230032 2021-08-17 10:31:22,419 [app.py:log_request()] - 101 GET /websocket?token=7FZWWVXRLRK7KE5BLDKACVTYCSH7TFKC (192.168.1.3) [_API_KEY_USER_] 31.80ms 2021-08-17 10:31:22,419 [websockets.py:add_websocket()] - New Websocket Added: 2791937744 2021-08-17 10:31:22,420 [app.py:log_request()] - 101 GET /websocket?token=NDZQPYRDLGTO6Q6Z5D5FR6P2MEISDLM5 (192.168.1.3) [_API_KEY_USER_] 32.28ms 2021-08-17 10:31:22,421 [websockets.py:add_websocket()] - New Websocket Added: 2791998704 2021-08-17 10:31:22,422 [app.py:log_request()] - 101 GET /websocket?token=23PYKQOGDXZZHKD2QCZJGZG7HZDJCYTN (192.168.1.3) [_API_KEY_USER_] 31.48ms 2021-08-17 10:31:22,422 [websockets.py:add_websocket()] - New Websocket Added: 2792001040 2021-08-17 10:31:22,423 [app.py:log_request()] - 101 GET /websocket?token=T6O374XMZNRTAVFMQCM6YVRDNH7SAIDA (192.168.1.3) [_API_KEY_USER_] 31.89ms 2021-08-17 10:31:22,424 [websockets.py:add_websocket()] - New Websocket Added: 2792000432 2021-08-17 10:31:22,425 [app.py:log_request()] - 101 GET /websocket?token=FIVAWP4GZTS5KMTI7KUEFSIGV56DUCAP (192.168.1.3) [_API_KEY_USER_] 32.39ms 2021-08-17 10:31:22,425 [websockets.py:add_websocket()] - New Websocket Added: 2792000016 2021-08-17 10:31:22,427 [app.py:log_request()] - 101 GET /websocket?token=CBUKB74Q5DRZLUXC4WEIER7PM4TT63WU (192.168.1.3) [_API_KEY_USER_] 32.87ms 2021-08-17 10:31:22,427 [websockets.py:add_websocket()] - New Websocket Added: 2791999600 2021-08-17 10:31:22,428 [app.py:log_request()] - 101 GET /websocket?token=SLNCDWKWDD3MRBEBACUKDZIXAJVXY6KM (192.168.1.3) [_API_KEY_USER_] 33.11ms 2021-08-17 10:31:22,428 [websockets.py:add_websocket()] - New Websocket Added: 2791998928 2021-08-17 10:31:22,430 [app.py:log_request()] - 101 GET /websocket?token=PVW3ERS2I2AFVXJK5ZW3NMUGAZYCR3S2 (192.168.1.3) [_API_KEY_USER_] 33.75ms 2021-08-17 10:31:22,430 [websockets.py:add_websocket()] - New Websocket Added: 2791998256 2021-08-17 10:31:22,431 [app.py:log_request()] - 101 GET /websocket?token=SDETSAO4N7O7XQ5RD3RO53B5XRZM7JDF (192.168.1.3) [_API_KEY_USER_] 34.01ms 2021-08-17 10:31:22,431 [websockets.py:add_websocket()] - New Websocket Added: 2791997616 2021-08-17 10:31:22,433 [app.py:log_request()] - 101 GET /websocket?token=TL5VYH3WRDXEPZZ7YY7P6VP3GRIU5AG2 (192.168.1.3) [_API_KEY_USER_] 34.93ms 2021-08-17 10:31:22,433 [websockets.py:add_websocket()] - New Websocket Added: 2791902064 2021-08-17 10:31:22,435 [app.py:log_request()] - 101 GET /websocket?token=XCDGWVON6GJD4WBN6ZNPUBU5IGPOGRSU (192.168.1.3) [_API_KEY_USER_] 35.05ms 2021-08-17 10:31:22,435 [websockets.py:add_websocket()] - New Websocket Added: 2791900304 2021-08-17 10:31:22,436 [app.py:log_request()] - 101 GET /websocket?token=B62D32TZEOZFWX2OLJIQGV3LBKGZWJKK (192.168.1.3) [_API_KEY_USER_] 35.45ms 2021-08-17 10:31:22,436 [websockets.py:add_websocket()] - New Websocket Added: 2791901840 2021-08-17 10:31:22,438 [app.py:log_request()] - 101 GET /websocket?token=KW3E5WYZN7DEMRKSJWXGYKN5M6OOVTG4 (192.168.1.3) [_API_KEY_USER_] 35.57ms 2021-08-17 10:31:22,438 [websockets.py:add_websocket()] - New Websocket Added: 2791902352 2021-08-17 10:31:22,439 [app.py:log_request()] - 101 GET /websocket?token=VDEOP2LQFQXR6FKCJF32A4Z3ZKUKYTPN (192.168.1.3) [_API_KEY_USER_] 35.70ms 2021-08-17 10:31:22,439 [websockets.py:add_websocket()] - New Websocket Added: 2791902320 2021-08-17 10:31:22,441 [app.py:log_request()] - 101 GET /websocket?token=LYEJ2HJ6KN52RDA4AJNHPFYRL2FJUQAA (192.168.1.3) [_API_KEY_USER_] 35.99ms 2021-08-17 10:31:22,441 [websockets.py:add_websocket()] - New Websocket Added: 2792253584 2021-08-17 10:31:23,280 [moonraker.py:_check_ready()] -

Upon restarting the service immediatly I get a dozen or so of "new websocket connections" all from the same IP (node-red IP)

After deleting the config node and restarting node-red all is back to normal with moonraker/klipper.

MintyTrebor commented 3 years ago

How many monitor nodes have you setup?

jonferreira commented 3 years ago

Two.

Forgot to mention, apparently there was another possible issue where the node would keep asking for updates from moonraker while klipper was still disconnected, i.e. printer turned off for example

2021-08-16 21:39:54,604 [websockets.py:build_error()] - JSON-RPC Request Error: 503 Klippy Disconnected Traceback (most recent call last): File "/home/pi/moonraker/moonraker/websockets.py", line 224, in execute_method result = await method(conn, *args, **kwargs) File "/home/pi/moonraker/moonraker/websockets.py", line 317, in func user=ws.current_user)) File "/home/pi/moonraker/moonraker/moonraker.py", line 478, in make_request return await self._request_standard(web_request) File "/home/pi/moonraker/moonraker/moonraker.py", line 534, in _request_standard return await base_request.wait() File "/home/pi/moonraker/moonraker/moonraker.py", line 704, in wait raise self.response utils.ServerError: Klippy Disconnected

MintyTrebor commented 3 years ago

The monitor node is designed to re-check every 10 seconds if moonraker does not respond, or responds with an error state (eg printer powered off). I will have to check the code, but it my node should close the socket and open a new one on the next check, however if moonraker is not closing the connection at its end that could be an issue... The way I use it - is to turn on the monitor node when I turn on the printer, and turn it off on power off using:

Payload.monitorStateString Set to ON or OFF to start/stop the connection with Moonraker. Useful in scenarios where monitoring is not required 24/7. (EG. control by Start/Stop buttons on a dashboard)

I will have to do some digging

MintyTrebor commented 3 years ago

OK.... So the Moonraker documentation states that for HTTP requests you must send a blank request with the connection id to close the connect at moonrakers end, however it does not say that for websockets...I wonder if it is actually still required for websockets... I'm going to give it a try

jonferreira commented 3 years ago

I'm not an expert in websockets at all but IIRC you must explicitly close the session or re-use an existing connection, much like SSH I suppose.

Why opening a new connection on every check instead of using a single connection through out all the process?

MintyTrebor commented 3 years ago

No. Most host systems will auto expire connections after x seconds (or if in an error state), to explicitly prevent this from happening. Websockets are supposed to be stateless, requiring each end to manage the connection themselves. EG if node-red crashed with two sockets open, should moonraker keep those sockets open indefinitely? ( The answer is no ;-) ). For example I use the same websocket code with Duet Boards, and they auto close connections after 8 seconds....

MintyTrebor commented 3 years ago

I suspect this is just the way moonraker works, and I need to write some code to accommodate it...

jonferreira commented 3 years ago

Hmmm ok so either Moonraker isn't closing connections or its something on nodered side.

I did found weird that I've disables the nodes, deployed nodered, restart the klipper rpi4 and as soon as the pi was up I could see all those dozen websocket connections being opened.

Only after deleting the config node and restarting nodered did the connections were closed for good.

So it seems that maybe nodered is keeping those connections opened even though they weren't being used?


From: Minty Trebor @.> Sent: Tuesday, August 17, 2021 2:16:21 PM To: MintyTrebor/moonNode @.> Cc: jonferreira @.>; Author @.> Subject: Re: [MintyTrebor/moonNode] multiple websocket connection errors (#3)

No. Most host systems will auto expire connections after x seconds (or if in an error state), to explicitly prevent this from happening. Websockets are supposed to be stateless, requiring each end to manage the connection themselves. EG if node-red crashed with two sockets open, should moonraker keep those sockets open indefinitely? ( The answer is no ;-) ). For example I use the same websocket code with Duet Boards, and they auto close connections after 8 seconds....

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MintyTrebor/moonNode/issues/3#issuecomment-900289420, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB23L5ACEJUSFP76EIF6A3LT5JOKLANCNFSM5CJY6VJQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

MintyTrebor commented 3 years ago

Of course because I don't use the autostart feature of the monitor node - its entirely possible I cocked something up ;-)

MintyTrebor commented 3 years ago

Are you using the moonNode CMD node in your flows?

jonferreira commented 3 years ago

Nope, never used the command node.

I do use the autostart feature though. Part of the plan is to get push notifications when a print start/stops/cancels/etc

If I don't use the autostart presumably I'd have to manually start it and stop it each time? That would kinda defeat the purpose


From: Minty Trebor @.> Sent: Tuesday, August 17, 2021 3:05:06 PM To: MintyTrebor/moonNode @.> Cc: jonferreira @.>; Author @.> Subject: Re: [MintyTrebor/moonNode] multiple websocket connection errors (#3)

Are you using the moonNode CMD node in your flows?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MintyTrebor/moonNode/issues/3#issuecomment-900327832, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB23L5GGMQGHAV7XUYNRSD3T5JUBFANCNFSM5CJY6VJQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

MintyTrebor commented 3 years ago

Yes that is what the autostart is for, your use case is fine. My printer power is on a smart socket, so I turn it on with node-red (as part of my home automation flows), which also starts the monitor node, and stops it when I turn the printer off. You can use whichever method works for your use case. I have found of couple of possible reasons why it is erroring, but I cannot be sure. Still working on reproducing it..

In your scenario - you leave klipper/moonraker running and just turn off the printer? So both klipper and moonraker are "always on"?

jonferreira commented 3 years ago

Yes, I basically do the other way around. RPI with klipper is always on and it turns on/off the printer using a Shelly (it has native integration within Moonraker) before / after a print


From: Minty Trebor @.> Sent: Tuesday, August 17, 2021 3:33:47 PM To: MintyTrebor/moonNode @.> Cc: jonferreira @.>; Author @.> Subject: Re: [MintyTrebor/moonNode] multiple websocket connection errors (#3)

Yes that is what the autostart is for, your use case is fine. My printer power is on a smart socket, so I turn it on with node-red (as part of my home automation flows), which also starts the monitor node, and stops it when I turn the printer off. You can use whichever method works for your use case. I have found of couple of possible reasons why it is erroring, but I cannot be sure. Still working on reproducing it..

In your scenario - you leave klipper/moonraker running and just turn off the printer? So both klipper and moonraker are "always on"?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MintyTrebor/moonNode/issues/3#issuecomment-900352337, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB23L5BMFBG6OKMAFE5PIZTT5JXMXANCNFSM5CJY6VJQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

MintyTrebor commented 3 years ago

can you confirm what is returned if you post the below url (with your IP) in firefox when the printer is turned off please? http://[yourip]:7125/printer/info

I think I have caught the potential area it was failing but I want to do some other checks too.

MintyTrebor commented 3 years ago

Mine looks like this if just the printer is turned off: image

jonferreira commented 3 years ago

Will do as soon as I get home, thanks for the quick answer


From: Minty Trebor @.> Sent: Tuesday, August 17, 2021 4:11:51 PM To: MintyTrebor/moonNode @.> Cc: jonferreira @.>; Author @.> Subject: Re: [MintyTrebor/moonNode] multiple websocket connection errors (#3)

Mine looks like this if just the printer is turned off: [image]https://user-images.githubusercontent.com/26109080/129752355-188a9b08-5367-4c50-ab7b-be112b4a7303.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MintyTrebor/moonNode/issues/3#issuecomment-900384730, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB23L5GBEYN7O46BAD5MSA3T5J33PANCNFSM5CJY6VJQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

MintyTrebor commented 3 years ago

Sorry about issues. So being a klipper noob I have a question about your setup. When you use moonraker to power on the printer, does it automatically restart klipper to reconnect the service to the firmware? Because if I manually power cycle the printer, I have to ask moonraker to restart klipper before it will connect...

MintyTrebor commented 3 years ago

OK. I think I found the issue and resolved it. But to be sure I've added some additional checks, and changed the error handling. Before I closed and re-open the web socket for any error condition or if the printer became "unready". Now I only close and re-open if the socket reports an error, if moonraker responds then it keeps the socket open and just re-checks the printer status every 10 seconds. The trade off is that if moonraker stops responding it can take over a minute for the web socket to timeout and cause an error, and trigger a new web socket connection. I will publish it shortly, and let you know in here once I have done some further checks. The monitor node will now only start publishing updates once moonraker confirms the printer is "ready". (You should be able to see this by adding a debug msg node to the 2nd output connector and set it to full msg)

jonferreira commented 3 years ago

Mine looks like this if just the printer is turned off: image

same here:

{"result": {"state_message": "mcu 'mcu': Unable to connect\nOnce the underlying issue is corrected, use the\n\"FIRMWARE_RESTART\" command to reset the firmware, reload the\nconfig, and restart the host software.\nError configuring printer\n", "klipper_path": "/home/pi/klipper", "config_file": "/home/pi/klipper_config/printer.cfg", "software_version": "v0.9.1-679-g7fddc5a8", "hostname": "fluiddpi", "cpu_info": "4 core ARMv7 Processor rev 3 (v7l)", "state": "error", "python_path": "/home/pi/klippy-env/bin/python", "log_file": "/home/pi/klipper_logs/klippy.log"}}

jonferreira commented 3 years ago

Sorry about issues. So being a klipper noob I have a question about your setup. When you use moonraker to power on the printer, does it automatically restart klipper to reconnect the service to the firmware? Because if I manually power cycle the printer, I have to ask moonraker to restart klipper before it will connect...

Yes it does,just add this to moonraker.cfg


[power BLV]
type: shelly
#   The type of device.  Can be either gpio, tplink_smartplug, tasmota
#   shelly, homeseer, homeassistant, or loxonev1.
#   This parameter must be provided.
address: 192.168.1.23
#user: 
#password: 
off_when_shutdown: True
#   If set to True the device will be powered off when Klipper enters
#   the "shutdown" state.  This option applies to all device types.
#   The default is False.
locked_while_printing: True
#   If True, locks the device so that the power cannot be changed while the
#   printer is printing. This is useful to avert an accidental shutdown to
#   the printer's power.  The default is False.
restart_klipper_when_powered: True
#   If set to True, Moonraker will issue a "FIRMWARE_RESTART" to Klipper
#   after the device has been powered on.  The default is False, thus no
#   attempt to made to restart Klipper after power on.
restart_delay: 2
#   If "restart_klipper_when_powered" is set, this option specifies the amount
#   of time (in seconds) to delay the restart.  Default is 1 second.```
jonferreira commented 3 years ago

ublish it shortly, and let you know in here once I have done som

brilliant, happy to test that and should be pretty easy to find out if it works: if my instance doesn't die within 24h then it should be OK lmao

MintyTrebor commented 3 years ago

OK 0.0.16 is pushed. (npm install node-red-contrib-moonnode@0.0.16 to install manually)

I appreciate you sticking with it. Its very difficult to test for all error conditions when I only have 1 real instance to test against.

(I got very confused with error msgs in the moonraker log. Mainsail causes loads of errors if you open it and the printer is turned off !!)

I'll re-open this if it fails to resolve your issue.