MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.18k stars 19.21k forks source link

Serial stalling mid print #9179

Closed token47 closed 5 years ago

token47 commented 6 years ago

After changing my ODROID + ubuntu + marlin 1.1.7 to Raspberry Pi + Octopi 0.14 + marlin 1.1.8 I stared having print freezes. After debugging the problem a little I guess the problem is in the serial / commands confirmation. Every time the print stops the serial have something like this:

2018-01-14 16:32:37,897 - Recv: ok
2018-01-14 16:32:37,901 - Send: N25375 G1 X212.023 Y137.323 E0.4230*86
2018-01-14 16:32:37,909 - Recv: ok
2018-01-14 16:32:37,912 - Send: N25376 G1 X212.429 Y137.623 E0.4431*89
2018-01-14 16:32:38,530 - Recv: N25 T:240.23 /240.00 B:99.95 /100.00 @:46 B@:85
2018-01-14 16:32:40,534 - Recv:  T:240.39 /240.00 B:100.08 /100.00 @:42 B@:53
2018-01-14 16:32:42,540 - Recv:  T:240.47 /240.00 B:100.05 /100.00 @:40 B@:73
2018-01-14 16:32:44,545 - Recv:  T:240.16 /240.00 B:100.12 /100.00 @:46 B@:61

Note that the last send command has no "ok" and instead have a "N25" before the temp status. Octoprint stops waiting for the ok and the print freezes.

Any ideas why this is happening?

Thank you

GroovyDrifter commented 6 years ago

I'm experiencing a similar problem with Cura 15.4 on two printers (Anet A8 and TronXY X3A) updated recently to Marlin 1.1.8. The Anet freezes rarely but the TronXY has not been able to end a long print (and being long print more than one hour printing) since. I'll try to replicate the fail with Pronterface and try to get a log to confirm if it's the same issue. I'm getting also Unknown command "xx" errors randomly, being "xx" what looks like a partial command and sometimes the printers pause waiting for user keypress after executing a G29 wich may not be related with the former problem. None of this happened on the exact same hardware with Marlin 1.1.6. I'm reverting both to 1.1.6 to confirm there is no problem, then I'll upgrade both to 1.1.7 to check if the problem is already there or not (i guess if it is the same problem both will work with 1.1.7).

As token47 said looks related to serial communications. I run both boards at 115200bps but tried 57600 on the TronXY with no success.

If I find time I can replace the Melzi board with a 2560+RAMPS 1.4 to also check it it is a Sanguinololu derivative board problem.

token47 commented 6 years ago

FTR I use 250000 as the serial speed on a original arduino mega board. Small cable (aprox. 20cm), shielded, the same that worked well before.

token47 commented 6 years ago

I have commented on octoprint's bug https://github.com/foosel/OctoPrint/issues/2363 and opened a new one on https://github.com/foosel/OctoPrint/issues/2380

Lot's of info there. I'm not sure if the problem is on marlin, on linux serial driver or on octoprint. Looking for advice as how to dig deeper.

thinkyhead commented 6 years ago

Please test with the latest bugfix-1.1.x (and/or bugfix-2.0.x) branch to see if we fixed this issue. If the problem has been solved then we can close it. If you still see the bad behavior we should investigate further.

GroovyDrifter commented 6 years ago

I've since changed the melzi board with a ramps 1.4 and it works better but still issues. I also have built a new i3 clone with ramps and I get same kind of errors. Sometimes it's a "unknown command", sometimes a "line number missing"... it all looks like part of the line is missing (serial buffer corrupt?).

I've changed the geometry of the Z-axis in this printer this same morning (M8 screws for T8 ones), so I need to reflash it, I'll try the bugfix version and report back here.

thinkyhead commented 6 years ago

Try various USB cables, and try lowering BAUD_RATE

CmOsPL commented 6 years ago

Hey guys! I had the same issue in 1.1.7 and 1.1.8 and the problem wasn't there in 1.1.1. I've rebased my printer's specific changes to bugfix-1.1.x and after ~16h of printing using OctoPrint 1.3.6 running on OctoPi 0.14.0 it is safe to assume, that the issue is gone.

@token47, can you confirm that?

token47 commented 6 years ago

Unfortunately I could not get near my printer for weeks. I just had to print something today and to be on the safe side, I plugged a USB cable on my computer and went through ~10h of print with no issues. At least this confirms the problem is somewhere on the raspberry pi (the hardware itself, the linux o.s. or octoprint). I will try to test again with both the current one (should give me problems) and the bugfix one, as soon as possible I report back.

thinkyhead commented 6 years ago

To prevent temperature reports from mixing into with other output we've updated the code to make sure it uses delay and not safe_delay in places where it could caused trouble. (Since safe_delay can invoke the temperature auto-report). We have an additional change pending that will suspend auto temperature (and any other auto-reporting) when the output needs to be unbroken, such as when printing the "map" for unified bed leveling.

GroovyDrifter commented 6 years ago

I've tried the bugfix in one of my printers and yeah, the printer no more complains of incomplete commands, etc... but had to revert to 1.1.8 release because it started ignoring the M109 just after the M190 Cura 15.04.6 uses to get the bed and the extruder into printing temperature. as an example:

M190 S65.000000
M109 S205.000000
;Sliced at: Thu 08-03-2018 21:55:27
;Basic settings: Layer height: 0.2 Walls: 0.4 Fill: 20
;Print time: 32 minutes
;Filament used: 2.282m 6.0g
;Filament cost: 0.14
;M190 S65 ;Uncomment to add your own bed temperature line
;M109 S205 ;Uncomment to add your own temperature line
G21        ;metric values
G90        ;absolute positioning
M82        ;set extruder to absolute mode
M107       ;start with the fan off
G28
G29
G1 Z15.0 F7800 ;move the platform down 15mm
G92 E0                  ;zero the extruded length
G1 F200 E3              ;extrude 3mm of feed stock
G92 E0                  ;zero the extruded length again
G1 F7800
;Put printing message on LCD screen
M117 Printing...

;Layer count: 100
;LAYER:0
M107
G0 F7800 X100.849 Y101.518 Z0.300
;TYPE:SKIRT
G1 F1500 X102.127 Y100.614 E0.08849
G1 X102.529 Y100.458 E0.11286
G1 X103.266 Y100.084 E0.15958
G1 X104.002 Y99.799 E0.20419

etc...

what you see is that the bed heats up and when it's done, it goes into homing and bed leveling. I'll try to make more tests with other USB cables, slicers, etc and if the problem persists I'll open another issue.

token47 commented 6 years ago

Did another test today.

Printed many hours from a dell notebook running windows 10 and Simplify3D, with printer's USB cable plugged directly to the pc. Simplify3D has checksum and numbering ON (so as far as I can tell, it's sending exactly the same lines to the printer as octoprint is). Not a single problem.

Then, I exported the same GCODE, loaded on octoprint running on the RPi 3, plugged the printer to it and tried to print. As soon as I started, I got problems.

From observation, the problem seems to trigger more in very fast/short movements where a lot of data goes through the serial. There are missing "ok"s. The funny thing is, at the same time that the ok's don't come back, the temperature auto-report do send updates normally.

This is a normal part:

2018-03-08 22:02:49,962 - Recv: ok
2018-03-08 22:02:49,965 - Send: N524 G1 X104.754 Y99.804 E327.5683*96
2018-03-08 22:02:50,000 - Recv: ok
2018-03-08 22:02:50,003 - Send: N525 G1 X104.075 Y99.804 E327.6089*106
2018-03-08 22:02:50,252 - Recv:  T:205.08 /205.00 B:60.63 /60.00 @:53 B@:0
2018-03-08 22:02:50,266 - Recv: ok
2018-03-08 22:02:50,269 - Send: N526 G1 X101.003 Y96.732 E327.8690*104
2018-03-08 22:02:50,532 - Recv: ok
2018-03-08 22:02:50,535 - Send: N527 G1 X101.003 Y97.411 E327.9097*106
2018-03-08 22:02:50,577 - Recv: ok
2018-03-08 22:02:50,580 - Send: N528 G1 X103.396 Y99.804 E328.1123*103
2018-03-08 22:02:50,806 - Recv: ok
2018-03-08 22:02:50,810 - Send: N529 G1 X102.718 Y99.804 E328.1529*107
2018-03-08 22:02:51,053 - Recv: ok
2018-03-08 22:02:51,056 - Send: N530 G1 X101.003 Y98.090 E328.2981*100
2018-03-08 22:02:51,089 - Recv: ok
2018-03-08 22:02:51,092 - Send: N531 G1 X101.003 Y98.769 E328.3387*105
2018-03-08 22:02:51,527 - Recv: ok
2018-03-08 22:02:51,531 - Send: N532 G1 X102.039 Y99.804 E328.4264*110
2018-03-08 22:02:51,565 - Recv: ok
2018-03-08 22:02:51,568 - Send: N533 G1 X101.360 Y99.804 E328.4670*98
2018-03-08 22:02:51,954 - Recv: ok
2018-03-08 22:02:51,957 - Send: N534 G1 X101.003 Y99.447 E328.4972*101
2018-03-08 22:02:51,994 - Recv: ok
2018-03-08 22:02:51,997 - Send: N535 G92 E0*68
2018-03-08 22:02:52,258 - Recv:  T:204.97 /205.00 B:60.81 /60.00 @:55 B@:0
2018-03-08 22:02:53,739 - Recv: X:101.00 Y:99.45 Z:0.30 E:0.00 Count X:8080 Y:7956 Z:127
2018-03-08 22:02:53,745 - Recv: ok
2018-03-08 22:02:53,748 - Send: N536 G1 E-1.0000 F2100*26
2018-03-08 22:02:53,757 - Recv: ok

This is a missing-ok part:

2018-03-08 22:03:06,715 - Send: N625 G1 X122.358 Y104.510 E4.2920*80
2018-03-08 22:03:06,785 - Recv: ok
2018-03-08 22:03:06,788 - Send: N626 G1 X123.457 Y105.609 E4.3850*87
2018-03-08 22:03:06,823 - Recv: ok
2018-03-08 22:03:06,826 - Send: N627 G1 X124.136 Y105.609 E4.4257*89
2018-03-08 22:03:08,304 - Recv:  T:205.20 /205.00 B:62.40 /60.00 @:50 B@:0
2018-03-08 22:03:10,307 - Recv:  T:205.04 /205.00 B:62.29 /60.00 @:53 B@:0
2018-03-08 22:03:12,313 - Recv:  T:205.12 /205.00 B:62.27 /60.00 @:52 B@:0
2018-03-08 22:03:14,316 - Recv:  T:205.23 /205.00 B:62.14 /60.00 @:49 B@:0
<here I pressed "FAKE ACK" on octoprint which simulates an "ok" received>
2018-03-08 22:03:15,164 - Send: N628 G1 X123.037 Y104.510 E4.5187*84
2018-03-08 22:03:15,183 - Recv: Error:Line Number is not Last Line Number+1, Last Line: 626
2018-03-08 22:03:15,185 - Recv: Resend: 627
2018-03-08 22:03:15,190 - Recv: ok
2018-03-08 22:03:15,191 - Send: N627 G1 X124.136 Y105.609 E4.4257*89
2018-03-08 22:03:15,203 - Recv: ok
2018-03-08 22:03:15,205 - Send: N628 G1 X123.037 Y104.510 E4.5187*84
2018-03-08 22:03:15,215 - Recv: ok

There are a lot of this, eventually I cancelled the printing.

I'm not sure what to think of this, just reporting to contribute on the discussion.

thinkyhead commented 6 years ago

We recommend using a lower baud rate if communication issues come up a lot. And/or get a more robust USB cable with ferrite beads. (I assume you're using a different USB cable for Octoprint than for tethered printing.)

token47 commented 6 years ago

Same cable. And it's got ferrite. A very good cable. Printed another 8 hours with it again today (from the PC). What I find curious is that it's the same marlin version (1.1.8 -- I'm still looking for the time to upgrade it and test rcbugfix).

boelle commented 5 years ago

@token47 is the problem still there with latest bugfix 2.0? you are sure this is not hardware related?

token47 commented 5 years ago

I recently moved to a completely different setup so I'll not be able to test that anymore, sorry. But I'm sure that was not hardware related.

boelle commented 5 years ago

given that no one else has chimed in maybe click close and let people open new issues if they find the problem?

boelle commented 5 years ago

i mean it has been a year without any reports so its fair to assume the issue has been fixed

token47 commented 5 years ago

Agreed.

drphil3d commented 5 years ago

I'm getting reports from multiple people of this issue on marlin 1.1.9

github-actions[bot] commented 4 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.