OctoPrint / OctoPrint

OctoPrint is the snappy web interface for your 3D printer!
https://octoprint.org
GNU Affero General Public License v3.0
8.15k stars 1.67k forks source link

Print job frozen mid print. Octoprint server offline. Hotend and bed remain heated. #2647

Open devsfan1830 opened 6 years ago

devsfan1830 commented 6 years ago

What were you doing?

Printing a model

Run a print

What did you expect to happen?

The print to finish.

What happened instead?

The print was hung mid print, hotend and bed fully heated. Touch UI on my Pi 7" touchscreen reported server offline. Web access via any other computer failed. Access via Octoprint anywhere and Printoid also failed.

This is the same as previously reported and closed issue here: https://github.com/foosel/OctoPrint/issues/2118

Only difference is that the print did not finish and had only been active for a couple hours. I followed the advice in that thread and posted the results there. In case that cannot be reopened, this is what I posted there:

I can reach the Pi via SSH. But web ui via any computer and the pi touchscreen wont respond. Says the server is offline. , I use Octoprint Anywhere for remote access and it will not connect. Printoid I use for my phone access while home since the printer is now in my basement. That too will not connect.

So, via SSH i get the following: "ps -ef | grep -i octoprint" outputs this: pi 15441 14056 0 21:54 pts/0 00:00:00 grep --color=auto -i octoprint

"ps -ef | grep haproxy" outputs the following: root 925 1 0 12:10 ? 00:00:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid haproxy 927 925 0 12:10 ? 00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds haproxy 928 927 0 12:10 ? 00:00:06 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds pi 19468 14056 0 22:05 pts/0 00:00:00 grep --color=auto haproxy

"ps -ef | grep -i octoprint | grep -i python" returns nothing

-Worth noting I do not have haproxy configured anymore. This is a few months old reflash of octopi and i since closed off public internet access to my network, Hence Octoprint Anywhere just for basic remote view and i can kill a failed job.

Using "sudo service octoprint restart" did indeed restart octoprint without a full reboot.

Did the same happen when running OctoPrint in safe mode?

Not tested in safe mode. The issue is extremely intermittent and seemingly random. Last case of the issue was a few months ago.

Version of OctoPrint

1.3.8

Operating System running OctoPrint

OctoPi 0.14.0

Printer model & used firmware incl. version

Prusa i3 MK2S, Firmware 3.1.0

Browser and version of browser, operating system running browser

Google Chrome 66.0.3359.139, Windows 10 64-bit

Link to octoprint.log

octoprint.log

Link to contents of terminal tab or serial.log

unavailable. serial.log not enabled.

Link to contents of Javascript console in the browser

not a UI issue

Screenshot(s)/video(s) showing the problem:

I have read the FAQ. Yes

foosel commented 6 years ago

So, this looks like a server crash, and one that doesn't even log any kind of exceptions to the log anymore. Notice how the heartbeat just stops, without any kind of errors being logged.

The problem is that since it doesn't even log anything, this makes it tricky to determine what might be the cause here. There's another issue about the Python process in which OctoPrint runs simple segfaulting, see #2424. That also contains steps on how to figure out what's making it crash - please try that and when it happens next provide the resulting log here.

If at all possible, running a lengthy period in safe mode would also help narrow this down further. It might be pure coincidence, but I notice a lot of people running into weird crash issues the past couple of months all had the Anywhere plugin installed and had it run into backoff log messages. Maybe those log messages have the same cause as the crashes or something like that.

JabawokJayUK commented 6 years ago

Same happening for me.

has happened on multiple prints, both in safe mode and normal.

Syslog would indicate that the serial driver is dropping: May 20 18:53:05 octopi kernel: [194758.424374] ftdi_sio ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32 May 20 18:53:05 octopi kernel: [194758.424479] ftdi_sio ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32 May 20 18:53:05 octopi kernel: [194758.562635] usb 1-1.3: USB disconnect, device number 9 May 20 18:53:05 octopi kernel: [194758.562930] ftdi_sio ttyUSB0: error from flowcontrol urb May 20 18:53:05 octopi kernel: [194758.563302] ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0 May 20 18:53:05 octopi kernel: [194758.563339] ftdi_sio 1-1.3:1.0: device disconnected May 20 18:53:05 octopi kernel: [194758.855333] usb 1-1.3: new full-speed USB device number 10 using dwc_otg May 20 18:53:05 octopi kernel: [194759.013986] usb 1-1.3: New USB device found, idVendor=0403, idProduct=6001 May 20 18:53:05 octopi kernel: [194759.013995] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 May 20 18:53:05 octopi kernel: [194759.013999] usb 1-1.3: Product: FT232R USB UART May 20 18:53:05 octopi kernel: [194759.014002] usb 1-1.3: Manufacturer: FTDI May 20 18:53:05 octopi kernel: [194759.014006] usb 1-1.3: SerialNumber: A907S16M May 20 18:53:05 octopi kernel: [194759.022120] ftdi_sio 1-1.3:1.0: FTDI USB Serial Device converter detected May 20 18:53:05 octopi kernel: [194759.022310] usb 1-1.3: Detected FT232RL May 20 18:53:05 octopi kernel: [194759.023110] usb 1-1.3: FTDI USB Serial Device converter now attached to ttyUSB0

This is also supported by the serial.log (taken in safe mode):

018-05-20 18:53:04,982 - Send: N114890 G1 X222.788 Y103.174 E0.2691*110 2018-05-20 18:53:04,993 - Recv: ok 2018-05-20 18:53:04,997 - Send: N114891 G1 X222.657 Y103.318 E0.2769*98 2018-05-20 18:53:05,163 - Unexpected error while reading serial port, please consult octoprint.log for details: SerialException: 'device reports readiness to read but returned no data (device disconnected or multiple access on port?)' @ comm.py:_readline:2417 2018-05-20 18:53:05,165 - Please see https://faq.octoprint.org/serialerror for possible reasons of this. 2018-05-20 18:53:05,177 - Changing monitoring state from "Printing" to "Offline (Error: SerialException: 'device reports readiness to read but returned no data (device disconnected or multiple access on port?)' @ comm.py:_readline:2417)" 2018-05-20 18:53:05,189 - Connection closed, closing down monitor This behavior has only started since the most recent update. The server seems to remain up though so a different situation than previously diagnosed with the same behavior.

c10l commented 6 years ago

I'm seeing a similar issue but the server stays up and running. I was running it on a tmux session and there's no segfault or anything weird apart from the serial error.

This is octoprint's output:

2018-05-21 20:55:23,141 - octoprint.util.comm - INFO - Changing monitoring state from "Offline" to "Opening serial port"
2018-05-21 20:55:23,145 - octoprint.util.comm - INFO - Changing monitoring state from "Opening serial port" to "Connecting"
2018-05-21 20:55:23,166 - octoprint.util.comm - INFO - M110 detected, setting current line number to 0
2018-05-21 20:55:25,011 - octoprint.util.comm - INFO - M110 detected, setting current line number to 0
2018-05-21 20:55:25,572 - octoprint.util.comm - INFO - Changing monitoring state from "Connecting" to "Operational"
2018-05-21 20:55:25,577 - octoprint.util.comm - INFO - M110 detected, setting current line number to 0
2018-05-21 20:55:25,635 - octoprint.util.comm - INFO - Printer reports firmware name "Marlin TH3D U1.R1.8c"
2018-05-21 20:55:33,465 - octoprint.filemanager.analysis - INFO - Analysis of entry local:E3Dv6_EZABL5.5.gcode finished, needed 21.42s
2018-05-21 20:55:34,768 - octoprint.filemanager.analysis - INFO - Starting analysis of local:_E3Dv6_EZABL5.5.gcode
2018-05-21 20:55:34,776 - octoprint.filemanager.analysis - INFO - Invoking analysis command: /home/pi/oprint/bin/python2 -m octoprint analysis gcode --speed-x=12000 --speed-y=12000 --max-t=10 --throttle=0.0 --throttle-lines=100 /home/pi/.octoprint/uploads/_E3Dv6_EZABL5.5.gcode
2018-05-21 20:55:35,840 - octoprint.filemanager.analysis - INFO - Analysis of entry local:_E3Dv6_EZABL5.5.gcode finished, needed 1.07s
2018-05-21 20:56:27,759 - octoprint.util.comm - INFO - Externally triggered heatup detected
2018-05-21 21:01:26,651 - octoprint.util.comm - INFO - Externally triggered heatup detected
2018-05-21 21:01:59,873 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-05-21 21:06:15,600 - octoprint.util.comm - INFO - Changing monitoring state from "Operational" to "Printing"
2018-05-21 21:06:15,647 - octoprint.util.comm - INFO - M110 detected, setting current line number to 0
2018-05-21 21:07:26,057 - octoprint.util.comm - INFO - Printer seems to support the busy protocol, telling it to set the busy interval to our "communicationBusy" timeout - 1s = 2s
2018-05-21 21:16:59,876 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-05-21 21:31:59,878 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-05-21 21:46:59,879 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-05-21 21:56:10,107 - octoprint.server.util.sockjs - INFO - Client connection closed: ::ffff:192.168.192.24
2018-05-21 21:56:19,885 - octoprint.server.util.sockjs - INFO - New connection from client: ::ffff:192.168.192.24
2018-05-21 21:56:26,259 - octoprint.util.pip - INFO - Using "/home/pi/oprint/bin/python2 -m pip" as command to invoke pip
2018-05-21 21:56:26,449 - octoprint.util.pip - INFO - Using "/home/pi/oprint/bin/python2 -m pip" as command to invoke pip
2018-05-21 21:56:26,484 - octoprint.util.pip - INFO - Using "/home/pi/oprint/bin/python2 -m pip" as command to invoke pip
2018-05-21 21:56:30,811 - octoprint.util.pip - INFO - pip installs to /home/pi/oprint/lib/python2.7/site-packages/ (writable -> yes), --user flag needed -> no, virtual env -> yes
2018-05-21 21:56:30,813 - octoprint.util.pip - INFO - ==> pip ok -> yes
2018-05-21 21:56:31,042 - octoprint.plugins.softwareupdate - INFO - Saved version cache to disk
2018-05-21 22:01:59,881 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-05-21 22:16:59,883 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-05-21 22:31:59,885 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-05-21 22:46:59,887 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-05-21 22:48:58,264 - octoprint.util.comm - INFO - Communication timeout while printing, trying to trigger response from printer.
2018-05-21 22:49:01,271 - octoprint.util.comm - INFO - Communication timeout while printing, trying to trigger response from printer.
2018-05-21 22:49:04,274 - octoprint.util.comm - INFO - Communication timeout while printing, trying to trigger response from printer.
2018-05-21 22:49:07,280 - octoprint.util.comm - INFO - Communication timeout while printing, trying to trigger response from printer.
2018-05-21 22:49:10,287 - octoprint.util.comm - INFO - Communication timeout while printing, trying to trigger response from printer.
2018-05-21 22:49:13,295 - octoprint.util.comm - INFO - No response from printer after 6 consecutive communication timeouts, considering it dead.
2018-05-21 22:49:13,323 - octoprint.util.comm - INFO - Changing monitoring state from "Printing" to "Offline (Error: Too many consecutive timeouts, printer still connected and alive?)"

This is /var/log/syslog:

[12138.110707] usb 1-1.1.3.1: Product: FT232R USB UART
[12138.110716] usb 1-1.1.3.1: Manufacturer: FTDI
[12138.121349] ftdi_sio 1-1.1.3.1:1.0: FTDI USB Serial Device converter detected
[12138.121512] usb 1-1.1.3.1: Detected FT232RL
[12138.122306] usb 1-1.1.3.1: FTDI USB Serial Device converter now attached to ttyUSB0
[19130.659012] ftdi_sio ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[19130.659107] ftdi_sio ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32

There's nothing of more value in dmesg or other log files. I didn't have serial.log enabled unfortunately.

foosel commented 6 years ago

@JabawokJayUK @cassianoleal if the server is staying up for you but you are seeing serial disconnects, it is not the same issue as reported by @devsfan1830. The fact that the serial line is vanishing on the OS level is something that OctoPrint can't do anything about. As always with such problems, make sure you've read this and this FAQ entry and that your power supply is decent. It could be that the update from 1.3.6 to 1.3.8 put just that much more resource consumption on your setup that a struggling PSU is now overwhelmed and causing issues like this.

This ticket is explicitly about the server going fully offline due to crashing and possibly a duplicate of the aforementioned #2424, but that won't be possible to determine until @devsfan1830 has tried what was suggested in that ticket in order to narrow things down.

devsfan1830 commented 6 years ago

Still working on it. So far ive run some 6 hour prints in safe mode or with Octoprint Anywhere disabled only and they all finished. I will re-enable OA and continue long prints until failure and use what was suggested to provide more info. This seems to be extremely intermittent. Almost wonder if its OA poorly handling a sporadic failed connection to its cloud service.

devsfan1830 commented 6 years ago

I think I'm going to remove the Octoprint Anywhere plugin. I'm now observing odd behavior where the 2.5GHz wifi that the Pi and two other devices that are on it goes "wonky". I have a garage door and an ecobee themostat on the 2.4ghz band of my tri band router. They both lose their cloud connections. Octoprint becomes unreachable from my laptop, desktop and phone. I can still interact with the local 7" touchscreen. Oddly enough, the Octoprint Anywhere site still gets its data. I reboot my router and it all comes back. The octoprint.log shows what seems to be ALOT of frequent check ins with the OA servers. My current theory is that its overwhelming my router and either congesting or locking up the 2.5GHz channel. Could explain the octoprint server crashing too somehow. I figure better safe than sorry and just remove it. Go back to a Haproxy based remote access. Thus way nothing is actively pinging the internet other than the connectivity check. I've posted my latest octoprint.local to see if you agree with my assessment of the plugin. That is, if it shows the needed info.

devsfan1830 commented 6 years ago

octoprint.log

foosel commented 6 years ago

Certainly looks a bit spammy based on the errors it logs, but can't see if during normal (error free) operation it would also be this ping-heavy.

If your server runs stable without that plugin, I guess this should be raised with the plugin's author as well.

kennethjiang commented 6 years ago

@devsfan1830 I'm the author of OctoPrint Anywhere. When there is an active print job OA will consume ~3mbps uplink bandwidth (assuming your webcam is 640x480. it can be much higher if the webcam is higher resolution). Depending on your ISP there is a possibility that your router will be overwhelmed.

I have released a version a few months ago to reduce the bandwidth consumption to about 5% when the printer is idle. I have not heard many reports on router issue since then.

Your octoprint.log indicates a strong possibility that the router is overwhelmed. OA is quiet when there are no connectivity issues.

If your ISP can't support the uplink bandwidth it requires, it's a good idea to uninstall OA.

kennethjiang commented 6 years ago

Oh OA is pure Python so it's high unlikely that it'd cause any seg fault.

devsfan1830 commented 6 years ago

@kennethjiang Thanks for checking in. Yeah I'm actually using a logitch C920 @ 1080p. The 2.4GHz wifi does indeed fail once the print starts. So it seems yea, the router and possibly the pi is getting overwhelmed with the uplink traffic. ISP has nothing to do with things in this case. Its gigabit fios (900down/800up) but its my own router connected to the FIOS hub via ethernet. Unless theres a way to reduce video quality for JUST OA then I think yeah OA is no good for my use. Since I'd prefer 1080p for my Octolapse timelapses.

foosel commented 6 years ago

OA is pure Python so it's high unlikely that it'd cause any seg fault.

Its dependencies too? OctoPrint itself is also 100% pure Python, but I can't vouch for the third party dependencies that I rely on.

I'm also not saying your plugin is causing issues with regards to stability, I just noticed that it is installed in a lot of recent crash reports, and that has me worried there might be something up here.

kennethjiang commented 6 years ago

The dependencies of OA plugin are simple enough that I almost want to vouch for them. ;)

Winfieldmd commented 6 years ago

My apologies as I am a newb but Octoprint anywhere is my saving grace to monitor prints as port fowarding has been challenging however I have done 4 print since installing OA and two of them failed in exactly this way - server suddenly offline with no access from any device using any program, bed and hotend remain hot with print head just sitting there hung midprint. I am unable to restart the print, give up and resetting my pi (by unplugging) restores everything to normal. I tried upgrading my Pi power supply to 3.5a and plugged in the ethernet cord directly to router after the first time it happened just to try something but it failed again (but the cord is plugged into a router extender). My laptop wifi to the same router extension next to my printer is getting 76 download and 12Mbps upload if that matters or helps. I am not saying it is OA and I need and love OA, but with my sudden hangs and a few months of no problems with octoprint or my pi and this thread, it is suspicious. I am too much of a newb to begin to know how to get the logs etc. but I can try but I thought maybe my experience might add insight. Maybe OA works too well! It is amazing the videos of my prints instantly appearing on my cell phone on LTE. Oh my camera is the logitech c270 with basic settings (have not changed anything).

kennethjiang commented 6 years ago

It does sound concerning @Winfieldmd . What is the make and model of your printer? Do you mind sharing the octoprint.log and serial.log files that include the times when the crashes happened?

Another thing that you can help me with - next time when you are printing and OA is active, can you ssh to the Pi and check the CPU % using top? The very first version of OA was very CPU-demanding (>80%) and it took quite a bit of optimization to bring the CPU % under 15% (it shares the same process with OctoPrint so look for the python process) on my Pi 3 B+. But depending on your hardware (Pi + webcam) it may be different for you.

Winfieldmd commented 6 years ago

Ok let me try to help. I am using a Creality CR-10s and slicing with Simplify3D. I think I found the logs but its a bit confusing to decipher the ones I need. The last print that failed was started friday night 6/8 pretty late and failed in the night a few hours later which would be on 6/9. I see 1 octoprint log and 2 serial log files around then so I will try and attach them all I suppose but I cannot open them on my computer and dragging them here gives me a message that the file type is not supported. I am using Win 10. How do I open or convert them? As far as SSh into my Pi that will take some time and work as I know zero about doing that other than the one time I did it to set up my Pi using youtube video tutorials lol. Sorry you are dealing with a newb. Sorry for the late reply, I was out of town. Thanks for your help.

kennethjiang commented 6 years ago

Can you put all of them into a zip file and upload the zip file here? @Winfieldmd

Winfieldmd commented 6 years ago

@kennethjiang see if this works.

octoprint.log.zip

kennethjiang commented 6 years ago

Thank you @Winfieldmd for getting the log files to me. And sorry about the delay.

From the log files, it looks like you rebooted OctoPi (did the previous print job crashed just before that?) at 22:32 on 2018-06-08. And the print job lasted until about 02:34 on 2018-06-09 when OctoPi crashed. Does the timeline sound right to you?

For this period (22:32 on 2018-06-08 - 02:34 on 2018-06-09), OA reported twice that the connection to cloud timed out. So it had trouble connecting to the backend in the cloud for about 6 minutes during the period of 4 hours. It does seem a bit strange since you said that OctoPi was wired to your router so the connection should have been solid. But it doesn't seem too alarming either, especially because the connectivity issues were at least 2 hours before the crash.

I suspect the crash has to do with the serial communication between OctoPrint and the 3D printer. Please turn on serial logging by going to Settings > Serial Connection > Log communication so that we have a bit more info next time when crash happens again.

Winfieldmd commented 6 years ago

@kennethjiang yes that looks like the correct timeline. I will turn on serial logging and start a print as soon as I can. I will have to be a bit more careful as it crashes with everything staying hot which makes me a bit nervous. I will let you know the results. Thanks.

Winfieldmd commented 6 years ago

@kennethjiang I went ahead a reprinted the failed print above. It completed with no problems this time with no changes other than activating serial logging. I don't know if it will be helpful but here are the log files with serial logging activated on this print. Not that I would know what to do anyway yet, but I unfortunately I have not been able to log into my Pi to try and check the CPU utilization as for some crazy reason it will not accept my username/password which makes no sense. I may have to do re-image of octopi. Greatly appreciate your help.

octoprint log files.zip

kennethjiang commented 6 years ago

@Winfieldmd This log file looks normal, which is not surprising.

Don't try to figure out a way to log into your Pi, or re-image it. If the CPU % is high, the print head movement will become jerky, which doesn't seem to be the case.

Just keep an eye on this. In case it crashes again, please upload the octoprint.log and serial.log.

Thanks for providing all these info to debug, and your kind words! :)

kennethjiang commented 6 years ago

The same crash happened to me today too!

I attached all the logs that seem relevant, including the files in /var/log.

I googled it and it seems like many similar issues are related to insufficient power supply. @foosel A question for you - in the cases of insufficient power supply, would the whole Raspberry Pi reboot? In my case the Pi itself stayed on the whole time.

This is the 1st time such crash happened to me. I highly doubt it has to do with OA since, obviously, I have been using OA from the very beginning (about 1 year now).

octoprintlogs 2.zip

foosel commented 6 years ago

in the cases of insufficient power supply, would the whole Raspberry Pi reboot?

Only in really bad cases. But it starts browning out and doing all kind of nasty things.

Anyone encountering this, please follow all the steps outlined in here and report back on the results - that way we might be able to figure out why it's crashing. Right now it's just "it's crashing" not "because of x".

I highly doubt it has to do with OA since, obviously, I have been using OA from the very beginning (about 1 year now).

I'm with you there, I don't think it's OA per se. But it's either some cause that also causes OA to log "connection lost" messages just before the crash, or that OA makes more prone to show up. Considering that pretty much all users I got crash reports from over the past couple months (apart from one, who was running Octolapse) were running OA with such log messages preceding the crash. Not all of them were running Octolapse. I feel like there is some kind of correlation here. As I said, not necessarily a causation, but something's up.

Your own log also shows those messages:

2018-06-14 08:17:53,411 - backoff - ERROR - Backing off __forward_ws__(...) for 0.6s (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:17:54,046 - octoprint.plugins.anywhere - WARNING - Not connected to server ws or connection lost
2018-06-14 08:17:54,047 - backoff - ERROR - Backing off __forward_ws__(...) for 0.9s (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:17:54,915 - octoprint.plugins.anywhere - WARNING - Not connected to server ws or connection lost
2018-06-14 08:17:54,916 - backoff - ERROR - Backing off __forward_ws__(...) for 1.0s (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:17:55,907 - octoprint.plugins.anywhere - WARNING - Not connected to server ws or connection lost
2018-06-14 08:17:55,907 - backoff - ERROR - Backing off __forward_ws__(...) for 1.2s (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:17:57,140 - octoprint.plugins.anywhere - WARNING - Not connected to server ws or connection lost
2018-06-14 08:17:57,142 - backoff - ERROR - Backing off __forward_ws__(...) for 2.6s (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:17:59,695 - octoprint.plugins.anywhere - WARNING - Not connected to server ws or connection lost
2018-06-14 08:17:59,695 - backoff - ERROR - Backing off __forward_ws__(...) for 0.4s (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:18:00,146 - octoprint.plugins.anywhere - WARNING - Not connected to server ws or connection lost
2018-06-14 08:18:00,147 - backoff - ERROR - Backing off __forward_ws__(...) for 3.6s (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:18:03,788 - octoprint.plugins.anywhere - WARNING - Not connected to server ws or connection lost
2018-06-14 08:18:03,790 - backoff - ERROR - Backing off __forward_ws__(...) for 10.5s (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:18:14,348 - octoprint.plugins.anywhere - WARNING - Not connected to server ws or connection lost
2018-06-14 08:18:14,349 - backoff - ERROR - Giving up __forward_ws__(...) after 8 tries (WebSocketConnectionClosedException: socket is already closed.)
2018-06-14 08:18:14,350 - octoprint.plugins.anywhere - WARNING - Reached max backoff in waiting for server ws connection

The crash however appears to only have come over an hour later. What I find interesting in your case is that the crash apparently happened right at an Octolapse pause:

2018-06-14 09:44:31,997 - Send: N180495 G1 X198.000 Y2.000 F3000*127
2018-06-14 09:44:32,006 - Recv: ok 180495
2018-06-14 09:44:32,008 - Send: N180496 M400*21
2018-06-14 09:44:32,028 - Recv: ok 180496
2018-06-14 09:44:32,031 - Send: N180497 M114*20
2018-06-14 09:44:32,081 - Recv: ok 180497

The printer still had a chance to acknowledge the M114 position request, but the output didn't reach OctoPrint. The last server heartbeat is from 2 minutes before the end of the log, so that's probably the last sound the server made before dying (I'm assuming here it's actually not running, please verify that).

Winfieldmd commented 6 years ago

@kennethjiang ok I will keep my current setup as is and see if I have another print error this weekend when I can monitor it and I will then have a good serial log too submit if it happens again. Oh, and, I cannot believe you just had the same crash! Sorry to hear that actually. Weird.

kennethjiang commented 6 years ago

@Winfieldmd it's actually not a bad thing that I have a crash. At least it's slightly easier for me to find the info I need to debug it.

@foosel Agreed that the crash may be related to OA in a very subtle way. Some wild guess - OA is a bit unique as it consumes quite a bit of bandwidth (to stream webcam feed to the cloud). There were reports in the past that some routers/ISPs had issue with it, manifested by shaky connections. Any possibility that the shaky network connection will somehow cause the crashes?

When the crash happened I did check and confirm that the python process was gone, while Raspberry Pi didn't reboot (by using uptime).

I have started another print using screen. Hopefully it'll catch something in case it crashes again.

foosel commented 6 years ago

Any possibility that the shaky network connection will somehow cause the crashes?

Not with stock OctoPrint. My ISP has bad days some times and I've also happily developed while airborne, so a lack of a network connection shouldn't be an issue (and that's also what the connectivity checker is for, example). I can't vouch though for any plugins that might be installed and go nuts if a connection fails.

As a safety measure, @gege2b just noted in a ticket on a different issue that M85 with a larger timeout might be helpful here to at least make sure the heaters shut off if OctoPrint crashes.

foosel commented 6 years ago

To repeat myself from an earlier comment, it would seriously help if anyone running into this could please run these steps I already linked to in an earlier comment to figure out why the server is crashing. So far it's still just "it's crashing and nobody has any logs of that" - please help change that.

Just to repeat that here:

It might also be worth a shot to have OctoPrint run directly from within a screen instance instead of as a service:

sudo apt-get update && sudo apt-get install screen
sudo service octoprint stop
screen
/home/pi/oprint/bin/octoprint serve

That way everything printed to stdout (even a segfault of the underlying python process) will be recoverable by simply reattaching to the screen session after a crash (screen -x).

CoderFX commented 5 years ago

Got the same crash octoprint2.log

foosel commented 5 years ago

@CoderFX please read my comment from earlier. octoprint.log doesn't help here, the linked steps might.

AndKe commented 5 years ago

to all affected: for safety, use M85 in your custom gcode...

reclaimer5146 commented 5 years ago

@foosel

I ran into this issue ever since I blew out my sd card and moved to 1.3.8. I didn't have this issue on prior releases, which I believe I had stopped on 1.3.6 I pulled dmesg and all of the syslogs. I've had the issue multiple times, but never collected the dmesg. I collected it for the most recent crash.

I've also started octoprint in a screen, so if I run into it I'll post it up as well.

Please let me know if you need anything else. logs.tar.gz

reclaimer5146 commented 5 years ago

@foosel

I experienced another crash today. This time I used M85 to terminate it after 5 minutes. It was happening every 2-3 prints, but now its happening every print. I had screen running, but in my infinite wisdom, I terminated the detached screen. I did save the output of it from MobaXterm, but as Rich Text, so use with a Rich Text editor. Here is the last few lines of what I see: 2018-07-28 08:09:18,213 - octoprint.server.heartbeat - INFO - Server heartbeat <3 2018-07-28 08:09:30,820 - octoprint.util.connectivity_checker - INFO - Connectivity changed from online to offline 2018-07-28 08:09:30,829 - octoprint.plugins.ipOnConnect - INFO - ipOnConnectPlugin: 192.168.50.97 2018-07-28 08:24:18,215 - octoprint.server.heartbeat - INFO - Server heartbeat <3 stack smashing detected : /home/pi/oprint/bin/python2 terminated Aborted pi@octoprintCR10:~/logs $

In an effort to be more organized, I've resorted to naming the folders with the date of the crash. Then creating an archive of said folders. I included the output of the screen session in the 2018-07-28 folder. Going forward, each date will have the associated set of logs included.

Please let me know if there is anything else you need. 2018-07-28_python_crash.tar.gz

VA2XJM commented 5 years ago

I do also encounter this problem and I think I found out why on my side (I have no log to display right now, I will try to reproduce later...)

First, I had a rPI B (yes the good old rPI B V1) that had nothing to do at home so I installed OctoPi and tried it out. At first, the printer was pausing mid-print making lots of plastic blob everywhere(#2771 ?). Then I made tests monitoring with TOP. I know the rPi B is not the best device for that, but having it ready to use with a 32gb card to store files was just perfect for me...

At first I reniced octoprint to be nice -19 (sudo crontab -e : /5 * renice -19 $(pidof -x octoprint)). This made things faster and usable. I was able to print the test file I first upload without any pause.

So I uploaded my next print and started it right away. It started and few minutes later, the rpi ssh disconnected, the printer stopped and the temperatures settings remained. I disconnected and reconnected the rPI power supply to regain access. That is when I noticed a python script running even while not printing...

For unknown reason, even with niceness sets in favour of octoprint, if the python script that analyze the gcode to gives the various estimation is runing while printing it freeze and bring everything down. Without niceness, it seems to be the same script that makes OctoPrint pause while printing.

When I am done doing my projects I will makes more tests if nobody else has been able to replicate it.

VA2XJM commented 5 years ago

A long printing job (4days) just died at 32%... Lots of material going to the dumpster. Anyway here are some helping point I guess...

` ps -ef | grep -i octoprint ==> Returns nothing. Octoprint is well dead.

ps x ==> Camera stream is still active however. `

USB device attached to the pi : CR-10 3D printer & APC UPS data cable that gives status of the UPS.

The only thing going on was the printing and I had only my laptop monitoring the print. No action of any kind as been done other than viewing the camera display or the temp chart.

Good point is that for once, SSH remained accessible... So logs are available! Both files are TXT,

DMESG: https://paste.jmvien.net/?f959a6d839782521#xDbw+kvHWKTfpfAvspCbKJJNZ1CD1LLklhQ7qmELy9A= Syslog: https://paste.jmvien.net/?f837dd15853516ea#XNcO2MImjkeBXJHUZba8a6jIzY+iQyIgqTWtBPIuRvE=

Next step is to restart the print from zero :/ but this time using screen.

EDIT: Around 0200 I've seen the printer working. It happened between 0200 and 0230.

kennethjiang commented 5 years ago

It sounds like painful! Sorry to hear that @xJMV. Did you have OctoPrint Anywhere plugin installed and active when the printer died?

@foosel I was looking at the code of my plugin and I started to wonder if thread-safety may be the culprit here. Like I said my plugin is simple but it spawns up a few threads. I use Python synchronized Queue to communicate between these threads but I realized that these threads also use shared variables, such as _printer. I don't "write" (calling functions that obviously mutate internal states) to these variables. Just query status and settings. But I'm wondering:

VA2XJM commented 5 years ago

@kennethjiang negative. I have a VPN if I ever need to check things out when not on my home WIFI... Always using the Octoprint web guy.

VA2XJM commented 5 years ago

Here is one... I stopped OctoPrint, flushed the output file and restarted it. I let it stand by cause I prefered to be around when starting the print... So I got this error while on stand by.

pi@redcap:~ $ /home/pi/oprint/bin/octoprint serve > octoprint_output.txt Exception in thread Thread-7: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/pi/oprint/local/lib/python2.7/site-packages/octoprint/filemanager/analysis.py", line 227, in _work self._analyze(entry, high_priority=high_priority) File "/home/pi/oprint/local/lib/python2.7/site-packages/octoprint/filemanager/analysis.py", line 261, in _analyze result = self._do_analysis(high_priority=high_priority) File "/home/pi/oprint/local/lib/python2.7/site-packages/octoprint/filemanager/analysis.py", line 401, in _do_analysis raise RuntimeError("No analysis result found") RuntimeError: No analysis result found

After I restarted the server once again and let it stand by while I was doing something else, I came back to this:

2018-08-05 16:59:53,961 - octoprint.server.heartbeat - INFO - Server heartbeat <3 2018-08-05 17:14:53,969 - octoprint.server.heartbeat - INFO - Server heartbeat <3 2018-08-05 17:29:53,977 - octoprint.server.heartbeat - INFO - Server heartbeat <3 2018-08-05 17:40:27,335 - octoprint.filemanager.analysis - INFO - Analysis of entry local:Raspberry_Pi/rpi3-top_25mm-fan-4x-PLA.gcode finished, needed 3304.54s 2018-08-05 17:40:28,397 - octoprint.filemanager.analysis - INFO - Starting analysis of local:Address_Plate_-_Black.gcode 2018-08-05 17:40:28,419 - octoprint.filemanager.analysis - INFO - Invoking analysis command: /home/pi/oprint/bin/python2 -m octoprint analysis gcode --speed-x=6000 --speed-y=6000 --max-t=10 --throttle=0.01 --throttle-lines=100 /home/pi/.octoprint/uploads/Address_Plate_-_Black.gcode 2018-08-05 17:44:53,991 - octoprint.server.heartbeat - INFO - Server heartbeat <3 2018-08-05 17:46:11,720 - octoprint.filemanager.analysis - INFO - Analysis of entry local:Address_Plate_-_Black.gcode finished, needed 343.32s Those files analyzed have been analyzed and printed already so it should no have happened again. I really feel that the analysis script is the culprit in that...

VA2XJM commented 5 years ago

On mobile. While printing there was a segmentation fault without any other informations. The process output file do not show anything not even the seg fault. The screen output just display heartbeat then : Segmentation Fault.

Now home, Here is the result from screen: pi@redcap:~ $ /home/pi/oprint/bin/octoprint serve > octoprint_output.txt ::ffff:127.0.0.1 - - [05/Aug/2018 16:44:55] "GET /sockjs/info?t=1533487494056 HTTP/1.1" 404 - ::ffff:127.0.0.1 - - [05/Aug/2018 16:45:03] "GET /sockjs/info?t=1533487502092 HTTP/1.1" 404 - ::ffff:127.0.0.1 - - [05/Aug/2018 16:45:15] "GET /sockjs/info?t=1533487514109 HTTP/1.1" 404 - Segmentation fault

VA2XJM commented 5 years ago

I don't know if this can be related... Actually printing, all going well, the print did not pause or anything...

2018-08-07 18:16:08,130 - octoprint.server.heartbeat - INFO - Server heartbeat <3 2018-08-07 18:17:57,418 - tornado.general - ERROR - Uncaught exception, closing connection. Traceback (most recent call last): File "/home/pi/oprint/local/lib/python2.7/site-packages/tornado/iostream.py", line 523, in _handle_events self._handle_write() File "/home/pi/oprint/local/lib/python2.7/site-packages/tornado/iostream.py", line 872, in _handle_write del self._write_buffer[:self._write_buffer_pos] BufferError: Existing exports of data: object cannot be re-sized 2018-08-07 18:17:57,422 - tornado.application - ERROR - Exception in callback None Traceback (most recent call last): File "/home/pi/oprint/local/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start handler_func(fd_obj, events) File "/home/pi/oprint/local/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/pi/oprint/local/lib/python2.7/site-packages/tornado/iostream.py", line 523, in _handle_events self._handle_write() File "/home/pi/oprint/local/lib/python2.7/site-packages/tornado/iostream.py", line 872, in _handle_write del self._write_buffer[:self._write_buffer_pos] BufferError: Existing exports of data: object cannot be re-sized 2018-08-07 18:17:57,440 - octoprint.server.util.sockjs - INFO - Client connection closed: ::ffff:192.168.0.100 2018-08-07 18:17:57,455 - octoprint.server.util.sockjs - WARNING - Could not send message to client None: 'NoneType' object does not support item deletion 2018-08-07 18:18:13,944 - octoprint.server.util.sockjs - INFO - New connection from client: ::ffff:192.168.0.100 2018-08-07 18:31:08,139 - octoprint.server.heartbeat - INFO - Server heartbeat <3

VA2XJM commented 5 years ago

Got it again, on purpose this time, and twice!

uploaded the gcode file via Slic3r and started the printing right away without waiting for it to be fully evaluated. It seems to stop and bring OctoPrint service down every time.

Here are the second attempt log output (part of it, because there is nothing interesting to display...) The file analysis beginning appears twice in the log because I made a small update and re-uploaded it. First time I did not and the print stopped just like this one.

Can anybody reproduce this ?

It feels like having the service reading for printing and having the evaluation script reading the file too cause a problem...

^Cpi@redcap:~ $ /home/pi/oprint/bin/octoprint serve > octoprint_output.txt Segmentation fault

pi@redcap:~ $ tail octoprint_output.txt 2018-08-13 22:16:15,415 - octoprint.server.heartbeat - INFO - Server heartbeat <3 2018-08-13 22:26:49,715 - octoprint.filemanager.analysis - INFO - Starting analysis of local:vertical_spool_holder.gcode 2018-08-13 22:26:49,809 - octoprint.filemanager.analysis - INFO - Invoking analysis command: /home/pi/oprint/bin/python2 -m octoprint analysis gcode --speed-x=6000 --speed-y=6000 --max-t=10 --throttle=0.0 --throttle-lines=100 /home/pi/.octoprint/uploads/vertical_spool_holder.gcode 2018-08-13 22:27:17,010 - octoprint.filemanager.analysis - INFO - Starting analysis of local:vertical_spool_holder.gcode 2018-08-13 22:27:17,266 - octoprint.filemanager.analysis - INFO - Invoking analysis command: /home/pi/oprint/bin/python2 -m octoprint analysis gcode --speed-x=6000 --speed-y=6000 --max-t=10 --throttle=0.0 --throttle-lines=100 /home/pi/.octoprint/uploads/vertical_spool_holder.gcode 2018-08-13 22:29:17,531 - octoprint.util.comm - INFO - Changing monitoring state from "Operational" to "Printing" 2018-08-13 22:29:17,564 - octoprint.util.comm - INFO - M110 detected, setting current line number to 0 2018-08-13 22:31:15,428 - octoprint.server.heartbeat - INFO - Server heartbeat <3 2018-08-13 22:44:51,114 - octoprint.server.util.sockjs - INFO - New connection from client: ::ffff:192.168.0.190 2018-08-13 22:46:15,442 - octoprint.server.heartbeat - INFO - Server heartbeat <3

pi@redcap:~/.octoprint/logs $ tail serial.log 2018-08-13 22:47:08,814 - Recv: ok 2018-08-13 22:47:08,820 - Send: N2382 M10528 2018-08-13 22:47:08,878 - Recv: ok T:235.8 /235.0 B:40.4 /40.0 T0:235.8 /235.0 @:65 B@:0 2018-08-13 22:47:08,886 - Send: N2383 G1 X211.070 Y163.574 E21.56021108 2018-08-13 22:47:08,909 - Recv: ok 2018-08-13 22:47:08,918 - Send: N2384 G1 X209.696 Y164.397 E21.64392106 2018-08-13 22:47:08,926 - Recv: ok 2018-08-13 22:47:08,935 - Send: N2385 G1 X208.247 Y165.083 E21.72765110 2018-08-13 22:47:08,974 - Recv: ok 2018-08-13 22:47:08,982 - Send: N2386 G1 X206.739 Y165.622 E21.81133*107

pi@redcap:~/.octoprint/logs $ tail octoprint.log | plugins: | octopi_support: | model: 1B | revision: 000e | version: 0.15.1 | python: | pip: 9.0.3 | version: 2.7.13 | virtualenv: /home/pi/oprint 2018-08-13 22:46:15,598 - octoprint.server - INFO - ------------------------------------------------------------------------------

M00nPatrolz commented 5 years ago

@kennethjiang log.zip Just had the same issue. Print froze midway through. Could only ssh and reboot. The only changes I have made are Octopi Anywhere and Octolapse. Although the logs are semi foreign to me, after inspecting them it appears I have been having some network related drops since I installed OA although I never lost a print or even noticed until today. Yesterday I did change the resolution and fps which could explain the extra load that caused things to finally fail. I was also heavily monitoring the print today. Logs attached.

kennethjiang commented 5 years ago

@M00nPatrolz thank you for sending the log files. In the log files there are lot of error messages about OA being unable to connect to the cloud server.

It looks like the error started to occur around 2018-09-25 15:25:37 UTC. Did you change webcam's resolution before or after that time?

What's your webcam's current resolution? Do you always have a page open that shows OctoPrint Anywhere dashboard? The reason why I ask this question is when the printer is not watched on OA's dashboard, its bandwidth consumption is reduced by 90% and is usually low enough to not cause any problem.

Balduc commented 5 years ago

Hi everyone,

I've just experienced the same issue.

3d printer: cr-10 mini 12v Hardware for Octopi: raspberry pi 3 connection printer/raspberry: usb wire from cr-10 Version of Octopi and octoprint: The most up to date as this hour ( 06 october 2018, 3h02 PM) model print time: 22 hours total, 86% done... Slicer: Simplify 3d 4.0.0.0 Connection service via Wi-fi

Something new? yes: Octoprint Anywhere. But, since i'm using octoprint for 2 weeks, it's tricky to say that OA have something to do with it...

I've connected octopi to avoid the issue we all experienced with a PC: serial error that stop the print in the middle or more. Ironically, the same thing happend. Coincidence?

So for 7 days no issue without OA.

With OA, for 3 print, i have 3 different ends: One fine and print perfectly, one evolve badly ( i've lost connection with the octoserver in the middle and there were a lot of blobs, lines not aligned, etc, I was thinking that i did a bad slice, now i think it's the pi) and the last one just stop, heated bed and nozzle still hot.

I did not power off the pi, but i did power off the printer. Need to print this piece ASAP so I switch to a SD card print. No, i will not use the pi again to monitor/service my printer. Too risky.

But my pi is still up and running since the crash, and i can ssh into it. So if you need me to do something to help you, i'm waiting instructions.

kennethjiang commented 5 years ago

Can you upload octoprint.log file that includes the periods when OctoPrint crashed?

Balduc commented 5 years ago

I've tried to. But Octo did not find the log. And the python script was off. I've tried any kind of manipulation Foosel ask, but nothing works. So, the bad news is: I redownload the image, format the sd card and write a new image on it, I shouldn't turn the printer off in the first place maybe...

While i'm writing this, i'm printing off a sd card with a new Octopi/octoprint up and running to monitor the print while doing something else, without any plug-ins. I was thinking to forward the octoprint server port using a virtual static-ip service to monitor the print if i'm out of range of my network. I will, step by step, get close to OA (without it) to see if something change.

By the way, i'm using an ten+ years old 2 mp logitech usb webcam. And to put more data, the server stop each time after i first monitor the print using OA, using my android smartphone, for the 2 prints that failed. Did not change any fps variable. Did not use the low bandwith option.

Hope it helps, even just a little.

kennethjiang commented 5 years ago

@Balduc When you open OA dashboard on your smartphone, the OA plugin will start to stream webcam at 5x faster rate, to make it a more responsive webcam feed. This probably explains why you see difference when you open OA on your phone.

2mp resolution is not very low. Official Pi cam has default resolution to be 640x480 (~300kp).

What is your internet upload bandwidth?

Balduc commented 5 years ago

http://www.speedtest.net/result/7699476492.png

So there's a correlation?

kennethjiang commented 5 years ago

Your network speed is really good. I'm puzzled why there seems to be a network-related issue. Can you send me email k@getanywhere.io so that we can coordinate a time for phone call or google hangout? @Balduc

VA2XJM commented 5 years ago

Ok, so for sometimes now I changed my printing routine and did not encounter that bug again...

At first, I was leaving OctoPrint tab open 24/7 on my project laptop so I could monitor from my office.

Now I open OctoPrint in my browser to start the printing and then close the browser tab. I monitor it once in a while but always clause the tab when done.

No frozen prints since then...