Printer is dropping steps

jbernardis commented 9 years ago

I finally got around to isolating the gcode that causes dropped steps on my printer. I haven't tried slowing things down too much, but I'm not exactly speeding through the print either. I can reliably get the printer to drop steps at a given point. I have isolated the g code down to about 25 lines and it still occurs.

So now I am trying to build teacup under my config tool for the simulator. I already have the simulavr directory set up as instructed, and I build with the -DSIMINFO option specified on the gcc command line.

My question is: Is this all that is necessary to build for the simulator? When I subsequently run "run_in_simulavr.sh", this is what I get:

$ ./run-in-simulavr.sh droptest1l.gcode 
make: *** No rule to make target `size'.  Stop.
Assuming pin configuration for a Gen7-v1.4 + debug LED on DIO21.
STEPS_PER_M_X not found, assuming 80'000.
STEPS_PER_M_Y not found, assuming 80'000.
Taking STEPS_PER_M_X = 80000 and
       STEPS_PER_M_Y = 80000 for mm/min calculation.
WARNING: file avrreadelf.cpp: line 440: unknown signature in ELF file: 0xffffffff

MESSAGE Device name is unknown
FATAL: file avrfactory.cpp: line 57: Device type not specified, use -d | --device TYPE or insert a SIMINFO_DEVICE(name) macro into your source to specify the device name

./run-in-simulavr.sh: line 149: droptest1l.vcd: No such file or directory
./run-in-simulavr.sh: line 153: gnuplot: command not found
./run-in-simulavr.sh: line 294: droptest1l.vcd: No such file or directory

droptest1l.gcode statistics:
$

Traumflug commented 9 years ago

WARNING: file avrreadelf.cpp: line 440: unknown signature in ELF file: 0xffffffff

This is a bug I fixed last week, so it might be a good idea to git fetch && git rebase origin/traumflug and build again. It didn't appear to be critical, though.

FATAL: file avrfactory.cpp: line 57: Device type not specified, use -d | --device TYPE or insert a SIMINFO_DEVICE(name) macro into your source to specify the device name

Did you build from my repo instead of the official one? https://github.com/Traumflug/simulavr

If yes, you likely missed this part of the Makefile-AVR:

ifneq ($(realpath ../simulavr/src/simulavr_info.h),)
  # Neccessary for simulavr support, doesn't hurt others.
  CFLAGS += -DSIMINFO
  LDFLAGS += -Wl,--section-start=.siminfo=0x900000
else
  # Doesn't work for simulavr, can't allow dead code removal.
  LDFLAGS += -Wl,--gc-sections
endif

Which means, an additional CFLAG and LDFLAG is needed and --gc-sections isn't allowed when building for SimulAVR.

jbernardis commented 9 years ago

OK - I got a lot further. This time it seems to run ok, except standard error is flooded with these messages:

WARNING: file rwmem.cpp: line 222: Invalid read access from IO[0x21fb], PC=0x3a38
WARNING: file rwmem.cpp: line 231: Invalid write access to IO[0x5c]=0x0, PC=0x71c

I also need to override the default of 80 steps/mm for x and y.

Traumflug commented 9 years ago

This is a topic currently discussed on the SimulAVR mailing list: http://nongnu.13855.n7.nabble.com/Support-for-atmega2560-td197139.html#a197293

The trick is to compile for an ATmega644 and a Gen7 v1.4 board, no matter what your real hardware is. In case it's an algorithm bug it'll misbehave the same way on any hardware.

BTW., it wouldn't hurt to see these 25 lines of G-code here :-)

jbernardis commented 9 years ago

Here is the G code. It seems to stumble just after the move around line 25 or so.

G92 E0
G28
G1 Z2.780 F7800.000
G1 E0.00000 F1800.00000
G1 X98.098 Y112.359 E2.75132
G1 X98.098 Y113.100 E2.76211
G1 X93.612 Y117.586 E2.85450
G1 X94.353 Y117.586 E2.86529
G1 X98.098 Y113.841 E2.94242
G1 X98.098 Y114.582 E2.95321
G1 X95.094 Y117.586 E3.01507
G1 X95.835 Y117.586 E3.02586
G1 X98.098 Y115.323 E3.07247
G1 X98.098 Y116.064 E3.08326
G1 X96.576 Y117.586 E3.11460
G1 X97.317 Y117.586 E3.12539
G1 X98.098 Y116.805 E3.14148
G1 X98.098 Y117.546 E3.15227
G1 X98.058 Y117.586 E3.15309
G1 E1.15309 F1800.00000
G92 E0
G1 X96.378 Y108.322 F7800.000
G1 E2.00000 F1800.00000
M106 S255
G1 X96.418 Y108.282 E2.00314 F3600.000
G1 X96.050 Y107.570 E2.04764
G1 X95.299 Y108.322 E2.10669
G1 X94.931 Y107.610 E2.15119
G1 X94.971 Y107.570 E2.15433
M106 S0
G1 X93.298 Y108.322 F7800.000
G1 X94.050 Y107.570 E2.16961 F3450.000
G1 X92.222 Y107.570 E2.19589
G1 X91.470 Y108.322 E2.21117
G1 X90.394 Y107.570 E2.23004
G1 X89.642 Y108.322 E2.24532
G1 X88.565 Y107.570 E2.26420
G1 X87.814 Y108.322 E2.27948
G1 X87.248 Y107.059 E2.29936
G1 X85.986 Y108.322 E2.32503

Traumflug commented 9 years ago

Thanks. This G28 won't work, there are no means to trigger the endstops. It's pointless in simulation anyways, because there's no hardware to synchronize with.

Don't forget to raise the movement queue length to something like 32. run-in-simulavr.sh doesn't do handshaking (waiting for the "ok" before sending the next line), so everything coming in over the serial line and not processed immediately or fitting into the serial buffer gets lost.

A patch to implement this handshaking would be welcome, of course.

P.S.: there's a "save" file for GtkWave in the testcases folder. Use Menu -> File -> Open save file and it'll be the only file available. It'll format the signal analysis for X and Y with these nice speed curves and also give an idea on how to do the same for the E stepper.

jbernardis commented 9 years ago

FYI I have made no progress on this. Running the simulator provided no information - even after removing the G28 and increasing the queue length. Basically what the output told me was that the signal levels hadn't changed for the 60 second interval of the simulation. I haven't had the time to look further into it, and sadly have re-flashed marlin just so I could get back to reliable printing. I hope to get back to Teacup, but just have too much on my plate right now to troubleshoot this.

I will continue to monitor activity here, and am willing to maintain the config tool.

Traumflug commented 9 years ago

Basically what the output told me was that the signal levels hadn't changed for the 60 second interval of the simulation.

Does one of the G-code ins the testcases/ folder work? This is what I run on about every commit which could affect performance:

$ cd testcases
$ ./run-in-simulavr.sh short-moves.gcode smooth-curves.gcode triangle-odd.gcode

Traumflug commented 9 years ago

FWIW, I've just implemented G-code sending handshaking. Now one can simulate arbitrarily big G-code files. The 60 seconds simulated time limit is still in place to avoid endless simulations.

Traumflug commented 9 years ago

I get this in SimulAVR, Teacup compiled with config.h.Profiling (see testcases folder) as config.h:

issue 123 simulation

jbernardis commented 9 years ago

Maybe it's a matter of how I've been building the code. My directory us a hybrid of the official branch merged with my config tool. I haven't been using the makefile. I didn't notice though had a config.h for profiling. I have some time now to try to get back to this.

Traumflug commented 9 years ago

One thing to consider is certainly that if you compile for a different controller you likely also use different pins. Which means that run-in-simulavr.sh observes the wrong pins. See line 40..58 in this script.

Traumflug commented 9 years ago

Can we pick up on this one? I'm not exactly comfortable with knowing that a user suffers step losses. To run the same G-code on my own controller I'd need your printer and board config. You can open a branch and push them there or email them to mah@jump-ing.de. Thanks.

Looking at the G-code, I see some E-only commands. I vaguely remember there was an issue with speed calculation of such moves a long time ago.

Also ... which axis suffers these losses and approx. how many steps (one, half a millimetre, 10 millimetres)?

Traumflug commented 9 years ago

Still seeking for evidence of step losses actually happening. Mounted an indicator on a bare stepper connected to the Y axis, it works just fine.

One thing though, while adding a few lines of debug code here and there I found and fixed a bug in delay_us().

Another thing I found is, your configuration limits X and Y to 2500 mm/min, while the given G-code wants up to 7800 mm/min. As a result, most movements run into the speed limitation. Looking at the code, this shouldn't harm, though.

I've put all this onto a new topic branch, y-step-losses. Hmm. Jeff, could you perhaps try again, if just to try to describe better what's actually going wrong?

jbernardis commented 9 years ago

Some good news!

I brought down the experimental branch, and flashed it to my printer. I then printed the object that was giving me so much trouble, and it printed fine. Then I edited my config to raise my X and Y limits to 5000 mm/min. Again, it printed perfectly - no dropped steps. I don't know if your delay_us fix is in this branch or not, but if not, I can't explain it.

It looks like I'm back printing with Teacup again.

I'm still having issues with temperature stability on my hot end, but I'm going to post that on the wiki since it has nothing to do with this issue.

Traumflug commented 9 years ago

Excellent! Uhm, this delay_us() fix isn't even on the experimental branch, yet, it's on y-step-losses.

I'm still having issues with temperature stability on my hot end

David Forrest did quite some work on this, which resulted in at least 3 branches starting with "issue74". Looks like I should rebase them next and also sort out what they're doing. And yes, there's issue #74 which discusses this.

Traumflug commented 9 years ago

Looks like this is solved.

phord commented 8 years ago

I have been suffering sporadic Y-step losses for a while now. I had thought it was a mechanical or electrical issue, but I can't find anything more to tighten up. I'm going to try swapping my X/Y axes drivers (rotating my printer output) and see if it follows the swap over to the X-axis.

I tried plotting the file in the simulator and graphing the acceleration, but I don't see it yet. Acceleration actually looks pretty bad, with big spikes in a dozen or so places, but I see the same spikes in X and in Y. Maybe it's only that my Y-axis can't handle this kind of acceleration, but that seems unexpected to me.

I thought this might be tied to some bug @Traumflug found when implementing lookahead, but I can't find that change right now. I seem to recall something about an extra delay between moves, maybe caused by having both an end-delay and a start-delay. But maybe I am remembering it wrong.

If I find anything definitive I'll open a new issue. But I wanted to mention this here in case someone else is seeing this, too.

Traumflug commented 8 years ago

Hmm. These uncatchable step losses. Most important thing would be to have a repeatable case.

Acceleration actually looks pretty bad

It shouldn't. With the SimulAVR simulator acceleration looks as fine as it can get, see picture above.

I seem to recall something about an extra delay between moves

This is no secret: when one move ends and the next one starts, not only dda_step() is run, but also dda_start() is executed. dda_step() alone takes somewhere between 300 and 400 clocks, both take some 700 to 800 clocks. This limits the allowed speed at a junction.

Wurstnase commented 8 years ago

How much clocks are between Dir and Step-Signals?

Traumflug commented 8 years ago

Dir is set only once at the start of the movement, in dda_start().

Wurstnase commented 8 years ago

Yes, but maybe there is not enough time between dir and the first step. And then it looks like you miss a step.

Traumflug commented 8 years ago

Watched to much Marlin, he? :-) Direction is set even before the timer.

Wurstnase commented 8 years ago

The timer doesn't count. Just the time between you set the dir and the first step. With DRV8825 you need ~2μs. I think Teacup is very fast. Maybe too fast ;)

Traumflug commented 8 years ago

The timer does count, of course. If the timer is scheduled to do the first step of a new move in 50 ms, the time span between Dir and Step being set is these 50 ms.

Wurstnase commented 8 years ago

Ok, the dir is always set just behind the last step with the opposite direction?

Btw, last time I reassembled my printer I thought I have everything well made. But I have big issues with loosing steps. I've don't believe that this was a mechanical issue. But it was. In my case resonance. This frequencies are so bad and sometime hard to reproduce.

Traumflug commented 8 years ago

Ok, the dir is always set just behind the last step with the opposite direction?

In a consecutive sequence of movements, yes. It's always set, no matter which the previous direction was. Writing the flag is faster than finding out wether it should be written.

phord commented 8 years ago

I don't have a good repeatable case. I have some test cases which fail regularly, but the failures are inconsistent. In looking through the simulated output so far, I do not see the cause. But here are some observations so far:

Turning off LOOKAHEAD did not help.
Slowing the gcode speed down may have helped. More testing needed.
Slowing printer acceleration down (from 1000 to 200) seems to have helped, but my test failed for other reasons, so I need to repeat this.
Failures seem to occur mostly on "skin" layers where lots of angled parallel lines are drawn in both long and short lengths. It's possible this is only where it turns out to be visible due to averaging, but I think not.

We can lose axis precision causing this sort of shift by many reasons. :ballot_box_with_check: Acceleration set too high (lost motor steps) :ballot_box_with_check: Max speed set too high (lost motor steps) :ballot_box_with_check: Jerk set too high (lost motor steps) :ballot_box_with_check: Stepper driver overheats and shuts down momentarily (lost motor steps) :ballot_box_with_check: Belt slips on gears :ballot_box_with_check: Wiring problems (intermittent breaks, wires too thin, dirty connector, etc.) :ballot_box_with_check: Oscillation causing motor to lose torque at some frequencies :ballot_box_with_check: Firmware errors sending steps incorrectly causing driver to miss them :ballot_box_with_check: Firmware exceeds acceleration/jerk limits sporadically (lost driver steps)

I've been looking for evidence of the last one (exceeding accel limits) but I haven't found anything Teacup is doing wrong there yet in the simulator. I do see lots of wide-band step frequencies like in the picture above (caused by Bresenham slow-axis line tracking), but I assume these are not a problem. Firstly the inductance of the motor coils should eat some of the variance; but more importantly, I would expect the stepper driver to absorb these differences.

Because of the way steppers work, I expect I need to "lose" two whole motor steps before the stepper actually is commanded to the wrong position because of any step-loss mistakes resulting from physical limitations of the machine. That is, the stepper driver interpreted the step correctly but the stepper motor cannot move quickly enough to attain it. Since I am using 1/8th microstepping on this axis, it seems I would need to be wrong by 16 steps before this loss became visible in the print. Most of the wide-band paths I see in the simulations do appear to average out to the correct target speed in less than 8 steps. Does this line of thinking make sense?

I am pretty sure I am not overheating my driver. I have tried several different current settings and three different drivers. I have fans and heat sinks on the driver at all times.

There could be an oscillation issue. I'm skeptical, though, again because of the 16 steps of the microstepper. But the microstepper can also do a poor job of delivering the right torque when u-stepping, so it could be there still.

:bulb: I will try to build some torture tests this weekend to illuminate the problem and find the culprit.

:bulb: I also have a new SilentStepStick to try out (1/256th steps interpolated in the driver chip), but I haven't soldered it up yet.

:bulb: I still have not tried swapping the X/Y axes wiring (rotating the print) to see if the problem follows the swap. If it does not, it would seem I need to look at some physical problem on my Y-axis. If it does, then I need to swap the wiring and drivers. If it remains on X, then I need to look closer at the firmware, I think.

Traumflug commented 8 years ago

Since I am using 1/8th microstepping on this axis, it seems I would need to be wrong by 16 steps before this loss became visible in the print.

Yes, with microstepping the stepper driver kind of catches up a step done too quickly or another. I'd count here four microsteps, though, half of a full step. In the worst case situation five microsteps will cause the motor to swap over to the next full step.

Vibrations are a beast. My shiny Mantis Electron here uses trapezoidal spindles with a 3 mm pitch. This should be fine for above 2000 mm/min. However, around 1200 mm/min these spindles sometimes start to flutter radially, which causes very high friction for split seconds, which in turn results in lost steps or even a stalled stepper. Speeds above 1300 mm/min work just fine, but this doesn't help as the area around 1200 mm/min can't be entirely avoided.

If you have a frequency counter you could hook it up to the Y step signal to see wether the number of commanded steps is always the same. Then it'd be clear wether it's a mechanical or a software problem.

Another try would be to find out wether lost steps always add up to a full step. With a software fault it's likely that only microsteps are lost; momentarily mechanical stalls would always result in full steps going lost. This could be measured with a dial indicator touched before and after the print.

With all this one can run without filament, of course.

phord commented 8 years ago

I don't think the full-step occurring too early is the problem because this would be compensated (undone) when the stepper driver steps back the other direction, even if it has to step 5 microsteps backwards before it falls back. But this does not occur. Once my Y-axis "slips", the whole layer is off. Sometimes the slip appears to be as much as 1mm or more, which is about 50 whole steps on my machine. More often it is only off by 0.1mm or less, and then it often corrects itself on subsequent layers, presumably by slipping in the other direction on the next layer.

If someone else were telling me these details, I would tell them they have their acceleration set too high and they should tone it down until it works reliably. But this is my printer which has only ever run Teacup for 3 or 4 years and which has not had this problem so specifically before.

It may be a mechanical problem on my printer, of course. I sort of expect this is the case since I can't find the issue in the trace plots. Maybe I need new bearings on my Y-axis. Maybe there is a zip-tie dragging under a bearing somewhere new.

I do look suspiciously at the code because so much was changed for LOOKAHEAD and some things were "cleaned up" in the process. But also I tried a very old version of Teacup while testing this and I still saw the problem, I think. So this points me back to investigating mechanical issues.

There are other possibilities. I am using a newer Slic3r release these days, and maybe it produces more frenetic movements than the ones I used in the past. Or maybe my infill speeds are higher than I had set before. I can try some old gcode to see if the problem occurs there.

I will try to run more tests this weekend and report back here. Thanks for the suggestions.

phord commented 8 years ago

I dropped by ACCELERATION to 500 and my Y-step losses have vanished. I think my problem is just a sticky axis, but I haven't had a chance to experiment as deep as I wanted to. 500 ain't bad, though!

Traumflug / Teacup_Firmware

Printer is dropping steps #123