Doodle3D / doodle3d-client

The Doodle3D web interface that people can access on computers or tablets to draw doodles and control the Doodle3D WiFi box.
www.doodle3d.com
GNU General Public License v2.0
3 stars 8 forks source link

"seq_num_mismatch" when drawing on the Wanhao duplicator 4 #302

Closed jeroensomers closed 8 years ago

jeroensomers commented 8 years ago

When testing the Beta version 0.10.10-a on the Wanhao I inserted the following code first in the web-console:

MAX_POINTS_TO_PRINT = Printer.MAX_GCODE_SIZE = 999999999999999999999;
Printer.MAX_LINES_PER_POST = 2000;

Now when I send a huge drawing to the printer it stops in the middle of the print and gives the following error in the web-console

Printer:sendPrintPart: unexpected failure response for API endpoint printer/print (seq_num_mismatch)

I attached the WiFi-Box logfiles and the web-console logfiles.

wifibox-logs.zip

192.168.5.1-1457540465022.txt

woutgg commented 8 years ago

It seems this happened because the client saw a failed transmission (chunk 34, the first one after continuing from a buffer-full-wait). The failure seemed to be network related, since it is the ajax fail handler which triggers the resend. From 192.168.5.1-1457540465022.txt (300-309):

Printer:waitForBufferSpace: load ratio dropped below 75%, calling sendPrintPart...
Printer:sendPrintPart: sendIndex=68000/266957, sendLength=2000, sequence numbers: 34/134
progress:  17149/266957 (50850) (printing)
Printer:sendPrintPart: failed
Printer:startStatusCheckInterval
progress:  17299/266957 (52700) (printing)
request printer:sendPrintPart failed retry
Printer:sendPrintPart: sendIndex=68000/266957, sendLength=2000, sequence numbers: 34/134
progress:  17399/266957 (52600) (printing)
Printer:sendPrintPart: unexpected failure response for API endpoint printer/print (seq_num_mismatch)

wifibox.log (910-924) contains this:

03-09 15:43:43 [ENTR] (info)   : POST request for printer/print (remote IP/port: 192.168.10.179/52689)
03-09 15:43:43 [APRN] (info)   : API:printer/print
03-09 15:43:43 [APRN] (info)   :   hasControl: true
03-09 15:43:43 [ENTR] (info)   : GET request for info/status (remote IP/port: 192.168.10.179/52693)
03-09 15:43:47 [ENTR] (info)   : GET request for info/status (remote IP/port: 192.168.10.179/52693)
09-03 15:43:43 [COMM] (info)   : starting transmit of 93100 bytes of gcode data, maximum packet size is 1016
09-03 15:43:48 [COMM] (info)   : gcode data sent in 94 packets (4357 msecs)
03-09 15:43:48 [ENTR] (info)   : GET request for info/status (remote IP/port: 192.168.10.179/52693)
03-09 15:43:50 [ENTR] (info)   : POST request for printer/print (remote IP/port: 192.168.10.179/52693)
03-09 15:43:50 [APRN] (info)   : API:printer/print
03-09 15:43:50 [APRN] (info)   :   hasControl: true
03-09 15:43:52 [ENTR] (info)   : GET request for info/status (remote IP/port: 192.168.10.179/52694)
09-03 15:43:50 [COMM] (info)   : starting transmit of 93100 bytes of gcode data, maximum packet size is 1016
09-03 15:43:54 [COMM] (error)  : gcode data sent in 94 packets (3817 msecs) until error -2
03-09 15:43:54 [HRSP] (warning): Response status: fail (could not add gcode)

It looks like the same chunk (because sizes os 93100 are identical, which is unlikely to be accidental) is sent twice.

There is no error with the first one however, so it's a mystery to me where the client-side error came from, unless the response got lost due to a network error.

Also after printing for a while (on an um2, but that should not matter here), the problem did not occur again.

peteruithoven commented 8 years ago

Could it be a timeout? That sending part 34 the first time took to long? This wouldn't result in a message in the wifibox.log and trigger a general fail on the client side. Could we log more info on the fail on the client?

On the firmware, where it logs Response status: fail (could not add gcode) I don't see the reason? Could we log the reason / more info there too?

peteruithoven commented 8 years ago

O wait, I missed the until error -2 part. Looking at https://github.com/Doodle3D/print3d/blob/develop/src/drivers/GCodeBuffer.cpp#L52 this probably means buffer full?

woutgg commented 8 years ago

The -2 means there was a gcode append failure but does not say why in itself. Some improvements to logging have been made: https://github.com/Doodle3D/doodle3d-client/commit/656e37670eef13f891cefa020240ce63a81fd2d4 and https://github.com/Doodle3D/doodle3d-firmware/commit/260fac9cd45403ec8f6316ef657f7c49b11d0923.

peteruithoven commented 8 years ago

We have a new image (and some separate tweaks) with the above mentioned changes so we can test this again. The relevant logs seem to be made in the client (web console) and in the firmware logs that also show up under the default info log level.

jeroensomers commented 8 years ago

With the new image the "seq_num_mismatch" still occurs, I attached the logfile: wifibox-logs.zip

peteruithoven commented 8 years ago

I'm afraid those last logs aren't useful without the logrotated logs.

peteruithoven commented 8 years ago

@jeroensomers Let's try this again,with the latest released beta version. Since we have log rotation we might as try this the first time in bulk mode. But we should also try this with the default info log level at one time.

jeroensomers commented 8 years ago

With latest released beta I tried another big print still got a "seq_num_mismatch" both webconsole and logfiles (with rotated logs) are attached:

woutgg commented 8 years ago

In all three sets of logfiles there was a seq_num_mismatch error (as can be seen in the wifibox log files). Only in the first case there is evidence in the web console log that the client tried to resend a failed chunk, so that can most likely be attributed to #304? I think it's likely that the other two attempts failed due to the same cause.

So now that we have a recovery mechanism (see #304), if we do not see these errors again, I think this issue can be closed.

peteruithoven commented 8 years ago

That's good news.