Doodle3D / print3d

The application that runs on a Doodle3D WiFi box that communicates with printers.
www.doodle3d.com
GNU General Public License v2.0
13 stars 4 forks source link

Printer progress stops without any errors #44

Open woutgg opened 8 years ago

woutgg commented 8 years ago

While performing stability tests with very large prints, it has been observed several times that at some point printing just stops, without any errors. Progress status stays at the last printed line, temperature checks continue to work as expected do not work anymore within print3d, so it will keep reporting the same temperature.

It has first been seen here. Later on it has been observed several times in https://github.com/Doodle3D/WiFi-Box/issues/3 (the tests on 22-04 called 'Test from tablet 0.10.10-e'). A later test on a different tablet might also display this same behaviour (Test from galaxy tab tablet 0.10.10-e 2) where it printed only two lines and afterwards kept checking temperatures and adding code but did not print anymore (as seen in print3d-ttyACM0.log.1#7796).

So this happened on two tablets, but not on PCs, could there be any connection or is it coincidence?

woutgg commented 8 years ago

Printing stops because somehow an 'ok' response from the printer is missing (for a G1 command). Since readResponseCode() is the event pump that drives sending commands to the printer, the server will wait for an 'ok' indefinitely. Excerpt from this attempt showing the last printed line not getting an 'ok' response:

22-04 08:20:25 [ABSD] (bulk)   : printNextLine(): 53061/584207
22-04 08:20:25 [MLND] (bulk)   : sendCode(): G1 X35.144 Y146.438 Z3.869 F4200.000 E624.850
22-04 08:20:25 [MLND] (bulk)   : sendCode(): M105
22-04 08:20:25 [MLND] (bulk)   : readResponseCode(): 'ok'
22-04 08:20:25 [ABSD] (bulk)   : printNextLine(): 53062/584207
22-04 08:20:25 [MLND] (bulk)   : sendCode(): G1 X35.107 Y146.524 Z3.869 F4200.000 E624.851
22-04 08:20:25 [MLND] (bulk)   : readResponseCode(): 'ok T:228.7 /230.0 B:70.3 /70.0 @:127 B@:127'
22-04 08:20:25 [MLND] (bulk)   : readResponseCode(): 'ok'
22-04 08:20:25 [ABSD] (bulk)   : printNextLine(): 53063/584207
22-04 08:20:25 [MLND] (bulk)   : sendCode(): G1 X35.048 Y146.742 Z3.869 F4200.000 E624.855
22-04 08:20:25 [MLND] (bulk)   : readResponseCode(): 'ok'
22-04 08:20:25 [ABSD] (bulk)   : printNextLine(): 53064/584207
22-04 08:20:25 [MLND] (bulk)   : sendCode(): G1 X35.032 Y146.998 Z3.869 F4200.000 E624.859
22-04 08:20:27 [IPC ] (bulk)   : command: [>>getState]
22-04 08:20:27 [CMDH] (bulk)   : get state cmd
22-04 08:20:27 [IPC ] (bulk)   : command: [<<ok]
    (... no sendCode()/readResponseCode() logs after this point ...)
woutgg commented 8 years ago

A further observation: in the second attempt, print3d-ttyACM0.log#7514 just before the print gets stuck, both a G1 and M105 are sent and neither get a response. In fact, no response at all is received anymore even though the server keeps sending M105s.

Perhaps this just happens every now and then, and it is something that we could better work around instead of trying to fix it. Cura for instance has a workaround, implementing a 5 second timeout. If no 'ok' has been received after this, the next line is sent anyway - here.

peteruithoven commented 8 years ago

Very interesting.

Not sure, but looks like Octoprint might also do this; https://github.com/foosel/OctoPrint/blob/master/src/octoprint/printer/__init__.py#L100-L106 https://github.com/foosel/OctoPrint/blob/master/src/octoprint/util/comm.py#L1038-L1042

Quite a bad sign though that also the M105 S aren't answered. I'm hoping the serial communication isn't broken completely, like we experienced before a kernel patch in: https://github.com/Doodle3D/doodle3d-client/issues/216

Do you think we should do 2 / 3 tests with the tablet this went wrong with the most using a usb hub? To rule out the old serial issue?

woutgg commented 8 years ago

Ah yes, interesting that Ocoprint also does this. Instead of sending the next line after a timeout it attempts to trigger a response by sending a M105. It seems the default communication timeout is 30 seconds.

Although there does not seem to be a direct relation between print-stopping and USB errors (i.e. often no errors at all), it might be worthwhile to test some more for the unstable USB anyway. By the way, a search for '-145' revealed that it also occurred in Test from galaxy tab tablet 0.10.10-e 2; so together with the first occurrence that makes for two cases so far.

I will add a timeout mechanism to the print server similar to Cura's.

olijf commented 8 years ago

I do not think that there is a relation between the M105 timeout and the USB problem. About the -145 this seems to be quite normal on the AR9xxx chipset. See: https://dev.openwrt.org/ticket/16505

peteruithoven commented 8 years ago

I'm removing the 0.10.10 milestone from this issue, because we're afraid it's out of it's scope.