MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.14k stars 19.2k forks source link

Creatr printer stopping in mid-print problem #451

Closed DavidH137 closed 10 years ago

DavidH137 commented 11 years ago

I have not been able to create a large print using this Marlin code without the printer just stopping in mid-print. The computer does not hang but the print job is ruined.

I patched the code (described after the debugging) and would love to better understand why this fix would solve the stopping problem. (it serialized get_command processing with end of process_command completing).

I did some debugging, as described below and found:

The following was found to be a common problem when the printing stopped. These characters (or a variation of them) were in the log:

&$?ok ?ok ok LLNM$?ok

I had seen some of the “ok” messages before but most of the time I did not see the “ok” message, so I inserted some code that caught every stop in mid-print and every stop had the “ok” message. To debug do the following:

in module Marlin_main.cpp insert the following and reload the firmware:

void process_commands() { unsigned long codenum; //throw away variable char *starpos = NULL; // insert debug code above above char statement SERIAL_ECHO_START; SERIAL_ECHOPGM(“Command: “); SERIAL_ECHO(cmdbuffer[bufindr]); SERIAL_ECHO(” Last N: “); SERIAL_ECHOLNPGM(“”); // end debug code

This will result in messages in the log just before the stop like:

20:21:11.302 : N814 G1 X29.251 Y1.984 E2563.721 64 20:21:11.305 : echo:Command:N813 G1 X156.142 Y128.875 E2552.526 125813″ 20:21:16.652 : N815 G1 X30.036 Y1.985 E2563.77 127 20:21:16.684 : echo:Command:N814 G1 X29.251 Y1.984 E2563.721 64814″ ok 20:21:16.688 : echo:Command:N815 G1 X30.036 Y1.985 E2563.77 *127815″

18:26:08.799 : echo:Command:N311 G1 X23.858 Y200.94 E135.052 74311″ 18:26:08.828 : N313 G1 X51.72 Y228.016 E137.5592 115 18:26:09.713 : echo:Command:N312 G1 X23.858 Y200.154 E135.1011 6631&$?ok 18:26:09.717 : echo:Command:N313 G1 X51.72 Y228.016 E137.5592 115313″

16:44:52.400 : N1196 G1 X32.757 Y1.459 E5.05497 74 16:44:52.510 : echo:Command:N1195 G1 X23.333 Y10.882 E4.47453 122LLNM$?ok 16:44:52.514 : echo:Command:N1196 G1 X32.757 Y1.459 E5.05497 741196″ 16:45:41.100 : N1197 M105 57 16:45:41.106 : echo:Command:N1197 M105 571197″ 16:45:41.114 : echo:Exit M105 16:45:41.115 : N1198 G1 X33.58 Y1.459 E5.09082 113 16:45:41.126 : echo:Command:N1198 G1 X33.58 Y1.459 E5.09082 *1131198″

You will notice that the order of the echo of the commands is not aligned with the normal log of the command. I found this to be the key to solving the problem. I found that the Marlin code has a problem with how the buffer is handled to ensure each line of gcode is executed before the next line of gcode is sent to the printer.

I came up with a fix for this as described below.

The objective of this patch is to cause the Marlin code to process one command at a time, and to not proceed with processing another command until the previous command has completed. I’m not sure why this works as the Arduino board should only process code in sequence (I think), but it worked for me. I suspect it fixes an error in the way the buffer is filled with commands to print that causes an “ok” garbled message to be sent to the printer which causes the code to halt.

In Marlin_main.cpp after the #includes put the following statement:

boolean tnext = 0;

Same module add a line in void setup() after the first { (left bracket) as shown below

void setup() { tnext=0;

Same module add the following after the first { in void get_command()

while (tnext != 0) { delay(1); }

This causes the get_command to loop until the previous command has been processed.

In the same module void process_commands() do the following:

After the first { enter the following line:

tnext = -1;

Before EVERY “return;” and before EVERY “break;” enter a line with:

tnext=0;

If you miss one “tnext=0″ the code will not be set to allow the loop in get_command() to end.

In the same module, in void ClearToSend() add the following just before the } (ending bracket)

tnext=0;

daid commented 11 years ago

I'm not seeing something wrong in the log. The code is designed so it can receive multiple messages in a FIFO buffer and process those.

I think your patch just counters this design in a bad way. I'm not sure what software you use to send GCode to the Creatr. But Cura has extensive logging and error case catching in it's GCode sender. And the only case that I see where communications go wrong on an Ultimaker with Cura is when the USB connection itself craps out.

DavidH137 commented 11 years ago

daid,

Thanks for the reply. You are right about the patch countering the design in a bad way, I would say VERY BAD HACK. I'm using Repetier, recent version. But I was successful in printing a 81 layer part last night that I have tried to print at least 10 times with the printer stopping in mid-print each time. The only way to talk to the printer again is to disconnect and reconnect.

I will give your CURA a try without the patched code and see if the debugging can better locate the source of the problem I and others are having, see https://github.com/ErikZalm/Marlin/issues/436

Thanks

DavidH137 commented 11 years ago

Daid, BTW, is it normal to see garbage characters proceeding the "ok" message like those below, which are also in the log:

&$?ok ?ok [a little arrow pointing up and to the right]ok LLNM$?ok

The interesting thing I noticed about the log is that I put in a string search for "ok" after printing the debug message and never found one. This means it is being sent from another routine. I put a debug message in front of every "ok" in the code and the garbage "ok" would appear without the debug message, so unless the code is making it up on the fly it is not the code. I suspect it is the communication protocol with the COM port for the USB and may be and FTDI problem (I'm running Windows 7 SP1 64-bit, 64GB memory, Intel P6 @ 2700MHz core processor with dual ATI Radeon HD 5700 and a few terabytes of free space; so I doubt it is anything wrong with my computer).

Actually, as I was typing the above it occured to me that the problem may be related to the 6 cores and multiple threads, which may be why some people have the problem and not others.

Thoughts?

daid commented 11 years ago

Maybe a buffer-overflow problem in Repetier or Marlin?

I've yet to see it with Cura, but Cura wouldn't choke on those strings. Serial messages will get corrupted, known fact. So you need to be able to handle that.

I'm looking for an "ok" inside a line with Cura, not just at the start, but anywhere. This will help in case newlines get corrupted.

nophead commented 11 years ago

Serial messages should not get corrupted because the USB transport layers has checksums and retries etc. The data should either get there correct, or the USB connection breaks down and has to be re-established. The wires from the FTDI chip to the ATMEGA should be short and on the same board so there should be no corruption there either.

Smells like a bug somewhere, perhaps the USB drivers, FTDI drivers, or maybe a hardware fault.

On 16 April 2013 18:21, daid notifications@github.com wrote:

Maybe a buffer-overflow problem in Repetier or Marlin?

I've yet to see it with Cura, but Cura wouldn't choke on those strings. Serial messages will get corrupted, known fact. So you need to be able to handle that.

I'm looking for an "ok" inside a line with Cura, not just at the start, but anywhere. This will help in case newlines get corrupted.

— Reply to this email directly or view it on GitHubhttps://github.com/ErikZalm/Marlin/issues/451#issuecomment-16458507 .

daid commented 11 years ago

In theory they should not get corrupted, but in practice I've seen data corruption happen, and even while the USB->ATMega wires are short, they are pretty close to some high emission wires from the steppers.

nophead commented 11 years ago

Faulty PCB design then. If you can't send TTL signals from one chip to another on the same board 100% reliably you have a hardware fault.

On 16 April 2013 18:51, daid notifications@github.com wrote:

In theory they should not get corrupted, but in practice I've seen data corruption happen, and even while the USB->ATMega wires are short, they are pretty close to some high emission wires from the steppers.

— Reply to this email directly or view it on GitHubhttps://github.com/ErikZalm/Marlin/issues/451#issuecomment-16460354 .

DavidH137 commented 11 years ago

Thanks for the reply, USB cable is 4 feet with ferrite and routed about about 1 inch from nearest stepper driver. I have tried 3 MEGA boards, all with same results, I've printed the test model with CURA without a problem. I'm now printing the model that failed so many times. I have seen two ok�ok in the log so far, 6 hours to go, with the special character being a diamond with the bottom filled in and in word it is a dark diamond with a ?.

Good news, CURA did not choke on this like Repetier 0.85b did so maybe a Repetier problem with how it handles serial errors.

ODDepot commented 11 years ago

I wonder if AUX-1 is a problem on the newer RAMPS. The traces for TX and RX snake around and run over some high current traces of the motor drivers to get to the other side of the board. Regardless of whether you use AUX-1 it's still going to be a problem because they're always plugged into the Mega unless you snip the pins going in to D0 and D1. Just my thoughts...

Dustin

On 04/16/2013 01:50 PM, DavidH137 wrote:

Thanks for the reply, USB cable is 4 feet with ferrite and routed about about 1 inch from nearest stepper driver. I have tried 3 MEGA boards, all with same results, I've printed the test model with CURA without a problem. I'm now printing the model that failed so many times. I have seen two ok�ok in the log so far, 6 hours to go, with the special character being a diamond with the bottom filled in and in word it is a dark diamond with a ?.

Good news, CURA did not choke on this like Repetier 0.85b did so maybe a Repetier problem with how it handles serial errors.

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/451#issuecomment-16464213.

DavidH137 commented 11 years ago

Dustin, good thought but I rewired so the motor wires and the wires to the stepper drivers are all seperated by at least 10cm based on a post in the Creatr forum that pointed out this could be a problem. I even seperated the heating and temp and motor wires going to the print heads.

You are correct in that the stepper wires are close to the power wires on the Mega but these I also moved 10cm away. What do you mean by snipping D0 and D1?

ODDepot commented 11 years ago

I meant a flaw or weak design in RAMPS1.3/1.4 itself, if that's what you're using. I know it's been like this for awhile, but it still seems like a bad idea to route communication across the board and through the mess in the middle. It could be a contributing factor for interference that not everyone is experiencing since it's only on the recent RAMPS versions. If you're frustrated and want to be a guinea pig, you could cut the leads of your RAMPS board which go in to pin 0 and 1 on the Mega. That will isolate those traces on RAMPS and keep noise from feeding back into the Mega and on to the lines between the microcontroller and FTDI chip. Of course you couldn't use AUX-1 afterward.

On 04/16/2013 02:27 PM, DavidH137 wrote:

Dustin, good thought but I rewired so the motor wires and the wires to the stepper drivers are all seperated by at least 10cm based on a post in the Creatr forum that pointed out this could be a problem. I even seperated the heating and temp and motor wires going to the print heads.

You are correct in that the stepper wires are close to the power wires on the Mega but these I also moved 10cm away. What do you mean by snipping D0 and D1?

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/451#issuecomment-16466367.

daid commented 11 years ago

Good news, CURA did not choke on this like Repetier 0.85b did so maybe a Repetier problem with how it handles serial errors.

Like I said, I've made special care on the USB Serial communication in Cura. I used most of the tricks found in Pronterface and added a few enhancements. There could be an issue in the firmware, but I think host software should be able to handle as much as possible.

Improving all areas is good imho. So if there is a bug somewhere in the Marlin code, find it and fix it. But a workaround for the bug is not really the solution (and when I see a "delay" in timing sensitive firmware code I kind of start crying). At the same time the host software should handle as much as possible.

thinkyhead commented 11 years ago

It might be Marlin. I've had the same issue with Pronterface for some time, and finally switched to SD card printing exclusively, which doesn't exhibit the issue. I feel like it could be a latency/timing problem. Notice the difference when printing circles and other groups of many small moves. It is choppier via USB than via SD. I don't think it's my RAMPS hardware, but there is some buzz that USB can get flaky if there's too much EM noise.

michielha commented 11 years ago

I didn't want to say it but: I love this hack. I was struggeling with Marlins terrible serial comms ever since I installed it. I used Sprinter before, and never had any comm issues. Now I have a Rostock so I almost HAVE to use Marlin (which certainly is not a bad thing). But I could barely connect to it to activate an SD print. The connection would stall, Repetier host would sometimes crash. Sometimes I had to connect more than 10 times in order for it to work. Most of the times it would be saying: "5 command waiting", and no information from the printer was ever received. Tried all baudrates which didn't seem to matter much.

EDIT: comms still seem to 'hang' pretty much all the time :(

With this fix, I can connect and issue commands just fine, from there, the SD takes over so the delays don't really bother me (i'm on 9600 Baud). Thanks so much for the fix, there certainly must be done something to improve comms on Marlin. I'm quite sure it's not the USB.

DFTBA.

ErikZalm commented 11 years ago

The marlin serial functions are almost the same as the sprinter serial functions. I think you changed more then only Marlin to Sprinter. Can you test with a simple terminal program? Check if it print start when you reset the board. And check if it responds to M105.

michielha commented 11 years ago

I can see one of the serial LED's on the arduino constantly light up from the moment it stalls. It does not respond to M105. If I reset the board. I'm currently printing so I'm not going to check what happens if I press the reset button but i'm pretty sure it doesn't do anything. I'll check it with a serial monitor as soon as this print is done (or fails).

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.