Closed coldtobi closed 3 years ago
Thanks for reporting this! I think this might be a bug in the now vendored sockjs dependency. Due to a tornado update it had to be ported to asyncio(-ish) and this might be an issue with that implementation that I need to root out and fix.
After just having spent the majority of the day trying to understand what might be going on here, I think I found at least the reason for the Future exception was never retrieved
log entry, and possible for the whole storm of them.
The actual result of writing to the web socket is async and a future, since write_message
returns a future. The error handling inside send_pack
however assumes synchronous behaviour: https://github.com/OctoPrint/OctoPrint/blob/1.4.2/src/octoprint/vendor/sockjs/tornado/transports/websocket.py#L94 (same for rawwebsocket.py
). Thus the exception handler there will not trigger if there's an error while writing, and thus the connection will not have on_close
called.
The above commit works around this through a done_callback
on the future, which should hopefully solve it. I say should because even though I did my best in trying to simulate your error scenario, I was never able to trigger even the "retrieved" message, let alone a whole storm of them.
So... Just in case this isn't it after all (because I haven't been able to reproduce the actual error and just have been going by code), I'm going to throw a bunch more links in here to code sections possibly related to this, so I don't have to start again every time I try to wrap my head around Tornado internals in order to debug this...
exception
clears traceback log: https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/concurrent.py#L275write
calls exception
on write future: https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/iostream.py#L583gen.coroutine
around future resolving wrapper rewraps it into a future: https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/websocket.py#L871Hey 👋
This issue was reproducible in 1.4.2 by:
If you don't reconnect, it seems to get never logged.
With the current maintenance
(e781e9c3e8
) the issue is not reproducible after 5 cycles of the above procedure. So seems fixed 👍
Perfect, thank you for reporting back on this! Considering this solved then and will close once 1.5.0 rolls out.
What were you doing?
Unfortunatly I did not find a safe way to reproduce this bug, I saw it (only) a few times, like 2 times in a couple dozens of prints … I've got 1.4.1 installed, but saw that before, likely with 1.4.0.
Due to the lack of a way to reproduce, I could not reproduce this in safe mode.
My setup:
Bananapi (with Debian 10) hosting octorprint. Octoprint is installed via pip install method long ago and updated via Octoprints own update mechanism The printer is a self-made Marlin 2.0 based Ultimake clone.
Octoprint runs as it own user using a systemd service file. It runs with nice level = -4
Another Debian based PC with firefox to connect to Octoprint. (the IP is 10.x.x 68, host isildor)
The observed problem: I obeserved the problem while printing, when the printer suddendly started to stutter. With stutter I mean that it seems not to get the gcode fast enough, making tiny moves and then tiny pauses and then continues…
What I did (it might not be related, I try to be as verbose as possible):
Examining the journal for the octoprint.service when I guess the problem started:
The tornardo future exectpion is then repeated several times a second and brings the CPU utilization to 100%, as glances showed me.
This is around the point of time where the stuttering stopped. Again, this could be coincidents, but I believe it is related:
Looking at the PC's journal, S2R was entered there on:
and resumed on
What did you expect to happen?
see above
What happened instead?
see above
Did the same happen when running OctoPrint in safe mode?
Due to the low occurence, I could not reproduce it unfortunatly in safe mode.
Version of OctoPrint
at least 1.4.0 and 1.4.1 (unknown if it happens on earlier versions, my logs are not old enough to check this)
Operating System running OctoPrint
Printer model & used firmware incl. version
Ultimake Clone, heavily modified; Marlin 2.0.x
Browser and version of browser, operating system running browser
OS: Debian (Testing/Unstable) Browser: Firefox Quantum 68.11.0esr (64-Bit)
Link to octoprint.log
systemd's journal*: https://sviech.de/s/XtsCjWkfxCZFBNs/download octoprint's logs directoy: https://sviech.de/s/CP5zgwTbKCNeLXm/download (tar.gz)
(* Be aware that the systemd log starts way in the past… I didnt want to limit it to "today")
Link to contents of terminal tab or serial.log
sorry, did not record that information.
Link to contents of Javascript console in the browser
sorry, did not record that information.
Screenshot(s)/video(s) showing the problem:
n/a
I have read the FAQ.