Duet3D / RepRapFirmware

OO C++ RepRap Firmware
GNU General Public License v3.0
944 stars 535 forks source link

Duet Locks Up on Randomly During Web Interface Connections #148

Closed Alfaa123 closed 4 years ago

Alfaa123 commented 6 years ago

1.20 Beta 10

I have 10 Duet Ethernets all experiencing the same issue. Randomly, the machines will lock up in the middle of the print. Hot end and bed stay hot, and their closed loop control seems to still be running as they stay at their target temperatures (as measured by an IR probe, the PanelDue doesn't update when this happens) melting a hole in whatever it was printing at the time.

Duet will NOT respond to pings, web interface requests, or gcode manually entered on the PanelDue. PanelDue simply stops updating with "Printing..." still on the screen. ALL PanelDue commands stop working after this happens. Only a reset with the button on the board will allow us to use the machine again.

After having this issue for a few weeks, I finally narrowed the trigger down to new web interface connections. The web page will load most of the text, but the rest of the interface doesn't load. Pressing refresh will result in the duet refusing subsequent connections. This DOES NOT happen every time, and it's usually a few days when a machine will do this, but it's much more likely to happen when the web interface is used with greater frequency.

The ONLY other information I could gather is one error on the Panel Due which says:

"Error: Cannot read file. Error: Failed to write to file. Drive may be full."

All of our SD cards are 4GB and significantly less than 50% full.

So far, this has happened at least once on all 10+ of our running Duets.

dc42 commented 6 years ago

Already fixed in beta 11. See the 1.20beta10 thread on the duet3d.com forum.

Alfaa123 commented 6 years ago

Bad news:

I just had this same issue with another machine running beta 11.

This time, the extruder moved away from the part like it was paused deliberately. There is nothing written in the log about this event, however.

The panel due also did NOT have the usual errors, so it seems like this might be from something else.

dc42 commented 6 years ago

If you had a filament sensor or stall detection configured, then it could have been a system-generated pause. These events were not logged in 1.20beta11, but they are logged in 1.20RC1.

Alfaa123 commented 6 years ago

Neither of those are setup in our current configuration.

At the very least, this is the first "random pause" we had in a few days, so the issue is definitely not as severe, but still upsetting as we just lost almost an entire roll of polycarbonate.

I will be updating all the machines to RC1 as my maintenance schedule permits, so maybe that will uncover some information.

dc42 commented 6 years ago

Did the Duet reset itself? If so then if you run M122 without powering down or resetting again, the reset report will contain valuable information about the cause of the reset.

Alfaa123 commented 6 years ago

It did not.

Similarly to the previous incidents, it dropped all existing connections, refused all new ones (so no web interface) and stopped responding to the PanelDue.

Right after the reset, I did run M122:

=== Diagnostics === Used output buffers: 3 of 32 (9 max) === Platform === RepRapFirmware for Duet Ethernet version 1.20beta11 running on Duet Ethernet 1.0 Board ID: 08DDM-9FAMU-JW4S4-6JKD8-3S06T-T3WHS Static ram used: 11984 Dynamic ram used: 97376 Recycled dynamic ram: 1232 Stack ram used: 1192 current, 4484 maximum Never used ram: 15996 Last reset 00:01:24 ago, cause: reset button or watchdog Last software reset reason: not available Error status: 0 Free file entries: 9 SD card 0 detected, interface speed: 20.0MBytes/sec SD card longest block write time: 3.4ms MCU temperature: min 42.4, current 43.9, max 44.3 Supply voltage: min 24.2, current 24.4, max 24.5, under voltage events: 0, over voltage events: 0 Driver 0: standstill Driver 1: standstill Driver 2: standstill Driver 3: standstill Driver 4: standstill Date/time: 2017-12-08 16:11:31 Cache data hit count 212787473 Slowest main loop (seconds): 0.607009; fastest: 0.000044 === Move === MaxReps: 0, StepErrors: 0, FreeDm: 240, MinFreeDm 240, MaxWait: 0ms, Underruns: 0, 0 Scheduled moves: 0, completed moves: 0 Bed compensation in use: none Bed probe heights: 0.000 0.000 0.000 0.000 0.000 === Heat === Bed heater = 0, chamber heater = 2 === GCodes === Segments left: 0 Stack records: 1 allocated, 0 in use Movement lock held by null http is idle in state(s) 0 telnet is idle in state(s) 0 file is idle in state(s) 0 serial is idle in state(s) 0 aux is idle in state(s) 0 daemon is idle in state(s) 0 queue is idle in state(s) 0 autopause is idle in state(s) 0 Code queue is empty. === Network === State: 5 HTTP sessions: 1 of 8 Responder states: HTTP(1) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0)

I did notice you added a watchdog and the ability to have automatic crash dumps in Beta 11. Is it possible that's on the SD card somewhere?

Alfaa123 commented 6 years ago

One more update for you. The same issue happened on a machine running RC3 today.

Obviously this is lower priority now that it's happening much less, but I just figured I'd update this.

dc42 commented 6 years ago

Do you have a USB connection to the board as well as an Ethernet connection? If so, please see the important notes on using a USB connection on the wiki.

Alfaa123 commented 6 years ago

Nope. All machines are on our local network. Ethernet is the only connection we use.

Although, I will take a look at said wiki as it might come in handy to know in the future.

mydevpeeps commented 4 years ago

This used to happen to me all the time at first on 3.1.1 but it was due to some network settings on my asus router. This doesn't happen to me anymore. NOTE: This only happened to me on duet wifi.

T3P3 commented 4 years ago

please test with the latest stable version and open a new issue if this is still a problem