OctoPrint / OctoPrint

OctoPrint is the snappy web interface for your 3D printer!
https://octoprint.org
GNU Affero General Public License v3.0
8.3k stars 1.67k forks source link

tornado.application - ERROR - Future exception was never retrieved (makes printer stutter) #3689

Closed coldtobi closed 3 years ago

coldtobi commented 4 years ago

What were you doing?

Unfortunatly I did not find a safe way to reproduce this bug, I saw it (only) a few times, like 2 times in a couple dozens of prints … I've got 1.4.1 installed, but saw that before, likely with 1.4.0.

Due to the lack of a way to reproduce, I could not reproduce this in safe mode.

My setup:

The observed problem: I obeserved the problem while printing, when the printer suddendly started to stutter. With stutter I mean that it seems not to get the gcode fast enough, making tiny moves and then tiny pauses and then continues…

What I did (it might not be related, I try to be as verbose as possible):

Examining the journal for the octoprint.service when I guess the problem started:

Aug 13 08:39:01 bananapi2 octoprint[580]: 2020-08-13 08:39:01,742 - octoprint.plugins.detailedprogress - INFO - Printing started. Detailed progress started.
    Aug 13 08:39:01 bananapi2 octoprint[580]: 2020-08-13 08:39:01,767 - octoprint.plugins.stats - INFO - Printer Stats - on_event
    Aug 13 08:39:03 bananapi2 octoprint[580]: 2020-08-13 08:39:03,279 - octoprint.plugins.stats - INFO - Printer Stats - on_event
    Aug 13 08:39:08 bananapi2 octoprint[580]: 2020-08-13 08:39:08,542 - octoprint.server.util.flask.PreemptiveCache - INFO - Adding entry for / and {'query_string': u'l10n=de', 'path': u'/
    ', u'_count': 1, u'_timestamp': 1597300748.541694, 'base_url': u'http://bananapi2:5000/'}
    Aug 13 08:39:48 bananapi2 octoprint[580]: 2020-08-13 08:39:48,527 - octoprint.server.util.sockjs - INFO - New connection from client: 10.x.x.68
    Aug 13 08:39:48 bananapi2 octoprint[580]: 2020-08-13 08:39:48,990 - octoprint.server.util.flask - INFO - Passively logging in user _admin from 10.x.x.68
    Aug 13 08:39:49 bananapi2 octoprint[580]: 2020-08-13 08:39:48,998 - octoprint.access.users - INFO - Logged in user: _admin
    Aug 13 08:39:57 bananapi2 octoprint[580]: 2020-08-13 08:39:57,700 - octoprint.filemanager.analysis - INFO - Analysis of entry local:smd-60mm-container-lid.gcode finished, needed 68.44s
    Aug 13 08:39:58 bananapi2 octoprint[580]: 2020-08-13 08:39:58,113 - octoprint.server.util.sockjs - INFO - User _admin logged in on the socket from client 10.x.x.68
    Aug 13 08:50:52 bananapi2 octoprint[580]: 2020-08-13 08:50:52,806 - octoprint.server.heartbeat - INFO - Server heartbeat <3
    Aug 13 08:51:43 bananapi2 octoprint[580]: 2020-08-13 08:51:43,630 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 958}
    Aug 13 09:05:52 bananapi2 octoprint[580]: 2020-08-13 09:05:52,808 - octoprint.server.heartbeat - INFO - Server heartbeat <3
    Aug 13 09:06:43 bananapi2 octoprint[580]: 2020-08-13 09:06:43,596 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 1858}
    Aug 13 09:20:52 bananapi2 octoprint[580]: 2020-08-13 09:20:52,817 - octoprint.server.heartbeat - INFO - Server heartbeat <3
    Aug 13 09:21:43 bananapi2 octoprint[580]: 2020-08-13 09:21:43,716 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 2758}
    Aug 13 09:22:14 bananapi2 octoprint[580]: 2020-08-13 09:22:14,870 - tornado.general - WARNING - Write error on <socket._socketobject object at 0xa578b0d8>: [Errno 113] No route to host
    Aug 13 09:22:16 bananapi2 octoprint[580]: 2020-08-13 09:22:16,721 - octoprint.server.util.sockjs - INFO - Client connection closed: 10.x.x.68
    Aug 13 09:22:19 bananapi2 octoprint[580]: 2020-08-13 09:22:19,914 - tornado.application - ERROR - Future exception was never retrieved: Traceback (most recent call last):
    Aug 13 09:22:19 bananapi2 octoprint[580]:   File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/gen.py", line 1141, in run
    Aug 13 09:22:19 bananapi2 octoprint[580]:     yielded = self.gen.throw(*exc_info)
    Aug 13 09:22:19 bananapi2 octoprint[580]:   File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/websocket.py", line 876, in wrapper
    Aug 13 09:22:19 bananapi2 octoprint[580]:     raise WebSocketClosedError()
    Aug 13 09:22:19 bananapi2 octoprint[580]: WebSocketClosedError
    Aug 13 09:22:20 bananapi2 octoprint[580]: 2020-08-13 09:22:20,009 - tornado.application - ERROR - Future exception was never retrieved: Traceback (most recent call last):
    Aug 13 09:22:20 bananapi2 octoprint[580]:   File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/gen.py", line 1141, in run
    Aug 13 09:22:20 bananapi2 octoprint[580]:     yielded = self.gen.throw(*exc_info)
    Aug 13 09:22:20 bananapi2 octoprint[580]:   File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/websocket.py", line 876, in wrapper
    Aug 13 09:22:20 bananapi2 octoprint[580]:     raise WebSocketClosedError()
    Aug 13 09:22:20 bananapi2 octoprint[580]: WebSocketClosedError

The tornardo future exectpion is then repeated several times a second and brings the CPU utilization to 100%, as glances showed me.

This is around the point of time where the stuttering stopped. Again, this could be coincidents, but I believe it is related:

Aug 13 09:30:27 bananapi2 octoprint[580]: WebSocketClosedError
Aug 13 09:30:27 bananapi2 octoprint[580]: 2020-08-13 09:30:27,192 - tornado.application - ERROR - Future exception was never retrieved: Traceback (most recent call last):
Aug 13 09:30:27 bananapi2 octoprint[580]:   File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/gen.py", line 1141, in run
Aug 13 09:30:27 bananapi2 octoprint[580]:     yielded = self.gen.throw(*exc_info)
Aug 13 09:30:27 bananapi2 octoprint[580]:   File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/websocket.py", line 876, in wrapper
Aug 13 09:30:27 bananapi2 octoprint[580]:     raise WebSocketClosedError()
Aug 13 09:30:27 bananapi2 octoprint[580]: WebSocketClosedError
Aug 13 09:30:29 bananapi2 octoprint[580]: 2020-08-13 09:30:29,138 - octoprint.server.util.sockjs - INFO - New connection from client: 10.x.x.68
Aug 13 09:30:29 bananapi2 octoprint[580]: 2020-08-13 09:30:29,855 - octoprint.server.util.flask - INFO - Passively logging in user _admin from 10.x.x.68

Looking at the PC's journal, S2R was entered there on:

Aug 13 09:06:00 isildor systemd-sleep[17216]: Suspending system...

and resumed on

Aug 13 09:25:37 isildor systemd-sleep[17216]: System resumed.

What did you expect to happen?

see above

What happened instead?

see above

Did the same happen when running OctoPrint in safe mode?

Due to the low occurence, I could not reproduce it unfortunatly in safe mode.

Version of OctoPrint

at least 1.4.0 and 1.4.1 (unknown if it happens on earlier versions, my logs are not old enough to check this)

Operating System running OctoPrint

Printer model & used firmware incl. version

Ultimake Clone, heavily modified; Marlin 2.0.x

Browser and version of browser, operating system running browser

OS: Debian (Testing/Unstable) Browser: Firefox Quantum 68.11.0esr (64-Bit)

Link to octoprint.log

systemd's journal*: https://sviech.de/s/XtsCjWkfxCZFBNs/download octoprint's logs directoy: https://sviech.de/s/CP5zgwTbKCNeLXm/download (tar.gz)

(* Be aware that the systemd log starts way in the past… I didnt want to limit it to "today")

Link to contents of terminal tab or serial.log

sorry, did not record that information.

Link to contents of Javascript console in the browser

sorry, did not record that information.

Screenshot(s)/video(s) showing the problem:

n/a

I have read the FAQ.

foosel commented 4 years ago

Thanks for reporting this! I think this might be a bug in the now vendored sockjs dependency. Due to a tornado update it had to be ported to asyncio(-ish) and this might be an issue with that implementation that I need to root out and fix.

foosel commented 4 years ago

After just having spent the majority of the day trying to understand what might be going on here, I think I found at least the reason for the Future exception was never retrieved log entry, and possible for the whole storm of them.

The actual result of writing to the web socket is async and a future, since write_message returns a future. The error handling inside send_pack however assumes synchronous behaviour: https://github.com/OctoPrint/OctoPrint/blob/1.4.2/src/octoprint/vendor/sockjs/tornado/transports/websocket.py#L94 (same for rawwebsocket.py). Thus the exception handler there will not trigger if there's an error while writing, and thus the connection will not have on_close called.

The above commit works around this through a done_callback on the future, which should hopefully solve it. I say should because even though I did my best in trying to simulate your error scenario, I was never able to trigger even the "retrieved" message, let alone a whole storm of them.

So... Just in case this isn't it after all (because I haven't been able to reproduce the actual error and just have been going by code), I'm going to throw a bunch more links in here to code sections possibly related to this, so I don't have to start again every time I try to wrap my head around Tornado internals in order to debug this...

crysxd commented 4 years ago

Hey 👋

This issue was reproducible in 1.4.2 by:

If you don't reconnect, it seems to get never logged.

With the current maintenance (e781e9c3e8) the issue is not reproducible after 5 cycles of the above procedure. So seems fixed 👍

foosel commented 4 years ago

Perfect, thank you for reporting back on this! Considering this solved then and will close once 1.5.0 rolls out.