MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.36k stars 19.25k forks source link

[BUG] Ramps/Re-Arm losing steps with DRV8825 (layer shifts) #11047

Closed kAdonis closed 4 years ago

kAdonis commented 6 years ago

Hi there

With the current bugfix version I experience lost steps on all axes with 8825 Stepper drivers I tried the versions from June 16. and June 14. The bugfix version from june 3. works fine I use a selfmade coreXY printer with 8825 drivers in 1/32 mode on X/Y and 1/16 mode on Z axis, after the update the X/Y motors made a strange noise and when I tried to move in X it also moves Y I played around with parameters and tried MINIMUM_STEPPER_PULSE 4 which helped, but when printing there are layer shifts on every layer. I tried different MINIMUM_STEPPER_PULSE values in combination with MAXIMUM_STEPPER_RATE down to 65000 with no luck. Disabling Linear-Advance helps also, but there are still layer shifts. Disabling Adaptive Step Smoothing does not change anything

I changed the 8825 Drivers with spare A4988 in 1/16 mode and it prints fine again, but now i could see that the z axis with 8825 drivers also loses steps.

I keep Investigating, but I need help Thanks in advance

configs.zip

Bob-the-Kuhn commented 6 years ago

There's a possibility that this is related to the layer shift problem we've been having for a long time. I've changed the title so that people from that thread will look at this one.

kAdonis commented 6 years ago

Its possible that it is related, but I didn't had this problem before yesterday. The Firmware from June 3. is working fine

img_2429 Two testprints with the current firmware

img_2430 the same gcode as above with Bugfix 2.0 from june 3. printed after the parts above, so I know the drivers are still working

italocjs commented 6 years ago

if you are sure that it isnt related to the mechanics / drivers, Try looking at the feedrates, max accelerations and jerk configs, maybe there is something different there

kAdonis commented 6 years ago

I checked feedrate, acceleration and jerk I can print without problems using the same settings on the unchanged hardware if I upload the firmware from june 3. The layer shifts also occur when printing very slowly

p3p commented 6 years ago

Do you have the hardware to check how long the step pulse is? I'l try to get time to look into this, lots has changed in the planner/step generation recently, but I'm not sure what, if anything, has since the 3rd.

kAdonis commented 6 years ago

I'm sorry I dont have the hardware for this And I dont know much about coding, but there are several commits that changed stepper timing since the 3rd I have time, just tell me what I could test (without special equipment)

Bergerac56 commented 6 years ago

Could it be that the current of the drivers is just too low or too near the limit ? When playing with new versions of Bugfix and 8825's, I experienced also once a "skip" of steps (but not on all axes) and I had to increase it a bit. To test that, I issued G1 on X, Y or Z to move the head along an axe and put a "light" hand in the way of the head to increase a bit the resistance. If skipping steps, I increased a bit the current of this drive.

Perhaps nothing to do with your issue but who knows...

kAdonis commented 6 years ago

I checked the drivers Vref and it is 0.5V, exactly what it should be. This gives me 1A max current for 1,5A motors (67%) The A4988 drivers I tested with, were also adjusted to 1A and the motors were running fine.

Bergerac56 commented 6 years ago

It was the case for me too. Nevertheless, I had to increase a bit my X and Y axes after the little test explained above.

kAdonis commented 6 years ago

Okay, I tried to stop the carriage with my finger with no effect. The motors have a lot of torque. So the current in fine, but thank you Bergerac

I still suspect the stepper timing is off just a little bit. The z axis is driven by 3 motors each is connected to its own 8825 driver on an extra board. All 3 drivers get their Step, Dir and Enable Signals from shared Pins After some up and down of the z axis, the bed is tilted a little bit, this never happened before. So the 8825 are individual susceptible to the (supposed) signal error It might be that the actual step pulse duration is lets say 1.8 us, which is okay for driver 1 but driver 2 catches maybe only 99% of the pulses. But this is just a guess. I tried to understand the code in stepper.cpp but I had to give up

ejtagle commented 6 years ago

@kAdonis : Try to increase pulse width a bit. How long are the cables between the main board and the stepper extra board ?... You are probably running into signal integrity issues.... Or ground issues...

kAdonis commented 6 years ago

I tried to increase pulse width up to 5us with no effect. The cables to the extra board are 10cm long, but the extra board is only for the z-axis. The X/Y drivers are on the RAMPS as usual. I will try to connect the z motors to only 1 driver directly on the RAMPS and test again.

kAdonis commented 6 years ago

Okay, 3 motors are probably too much for one 8825 driver and 12V, so I tested with dual z drivers directly on RAMPS The issue is still there with pulse width 2us and max stepper rate 250000 I will test with higher pulse width

ejtagle commented 6 years ago

Humm... How many pulses per mm are you using for each of your Z axis ? I had problems when using 12v and fast step rates. 12v are just insufficient for fast moves...

italocjs commented 6 years ago

ejtagle, noob question, If i use 24v will i be allowed to go faster before the motor stop? I'm trying to increase my 400x400 printer speed, the first step was the volcano hotend, but now i'm looking into faster motion.

ejtagle commented 6 years ago

24v allows more precise control of motor coil currents. But, you must be very careful, as most RAMPs do not tolerate such increase of voltage, and if they do, then the problem could be the arduino itself, as it also powers its 5v regulator from that voltage.

The drivers themselves have no problem going up to 30...40volts

ejtagle commented 6 years ago

In case of RAMPS/REARM, i dont know, but if the board tolerates being powered from 24v, then i would certainly recommend it.

kAdonis commented 6 years ago

On my z axis I have 1600 steps/mm but only 4mm/s max speed, 6400 steps/s, much lower than x and y. But I have layer shifts even when printing with 30mm/s With older Marlin versions I could print with 70mm/s, so 12V is not ideal but it works.

ejtagle commented 6 years ago

I had reliability problems when moving my Z axis at more than 1mm/s. I have 4000 pulses per mm. And it is not software. It is hw related.

One of the problems we have now is that the old firmware placed a hidden limitation on the maximum pulse speed ... We removed that limitation (because Arduino is able to keep up without problems at higher speeds. But that seems to be triggering hw problems...

kAdonis commented 6 years ago

I'll reduce the steps/mm on z axis and do more test tomorrow but X and Y axes are more important

ejtagle commented 6 years ago

I have learned that deducing the maximum feedrate for an axis is very complex... It is not enough to test a plain move. The first step is always to raise the driver current, but then several printing tests must be done to be sure.

p3p commented 6 years ago

If, as @kAdonis says, it is possible to go back to the June 3rd build and have no problems with the same configuration then we must have changed something that is either reducing the maximum rate he can drive the axis, creating a pulse train his drivers do not like, .. or running the axis harder at the same specified rate. When it's reproducible like that I can't see it just being the hardware even if lowering speeds fixes it.

ejtagle commented 6 years ago

I think there was a change... After June 3rd stepper pulse period was fixed. Previously, the pulse period was 8x larger than the desired configured one. That IS a change - The bug only triggered when doing double/quad stepping...

ruggb commented 6 years ago

For what it is worth, I have a similar arrangement with high current motors and 8825s. Some time ago I stumbled upon some flyback diode boards on AliExpress, They were cheap so I decided to try them. I absolutely could not believe the change. At that point I realized much of my artifacts on the prints were a result of missing steps. No more missing steps and noise decrease significantly. Everybody should use them.

teemuatlut commented 6 years ago

@ejtagle Were you able to scope the pulse width when making the changes? Because if you didn't I can take a quick look.

kAdonis commented 6 years ago

update: The printer prints fine with pulse width set to 5us and a max stepper rate of 140000 kHz( dont know if this rate makes sense) I tried 4us pulse width and had again layer shifts. 8825 Drivers on X/Y and z axis z axis still powered by dual z drivers on RAMPS I think I could identify the "worst" (of 5) drivers and dont use it at the moment. Initially (see first post) 5us didnt work, but this test included the "worst" driver.

kAdonis commented 6 years ago

I dont wanted to give up my extra board easily, because (I think) its cleaner to use one driver per motor. So I tried again with three A4988 Drivers for the z axis installed on the extra board to be save from step losses. Still using 8825 driver in 1/32 mode for X/Y I used again a pulse width setting of 5us and and a max stepper rate of 140000 kHz It prints fine! So there are no signal integrity or ground issues, at least not obvious.

I'm really curious how long the pulse actually is?

Update : There are still layer shifts

ejtagle commented 6 years ago

@teemuatlut : I didn't scope the pulse width yet. I have the hardware required to do it, but as printig was and is working pretty well (using 2uS pulses on DRV8845) so i didn´t do it.

kAdonis: Try the following: Use a HEAVY copper wire (2.5mm^2 or even more) to join the ground of your external board with the ground of RAMPS/ReARM.

Also, place 2 capacitors in parallel between GND and +V of your external board, AT the external board connector. One of the capacitors should be 1000uF/25v (or more), the other one should be 100nF ceramic disc

kAdonis commented 6 years ago

@ejtagle Thank you for the advice with the capacitors But I need 5us pulse width even with the external board disconnected Do you mean 8845 drivers or 8825? Are yours from Pololu? My 8825 are cheap chinese ones

ejtagle commented 6 years ago

8825 drivers, Chinese clones.

ruggb commented 6 years ago

those Smoother Kit Addon Module for 3D Printer Motor Drivers are about 3 for the cost of 1 driver. I can't recommend them enough. Of course it will take 2 weeks to get them.......... I also have the cheap Chinese 8825 drivers. Though, since they are all made in China, the expense is everyone in between.

p3p commented 6 years ago

I've confirmed the step pulse duration to be reasonable under test conditions (3 axis move),

LPC1768 MINIMUM_STEPPER_PULSE setting

set duration (us) X (ns) Y (ns) Z (ns)
0 689 563 502
1 1064 937 812
2 1938 1811 1690
4 3940 3811 3674
6 5940 5687 5562

They're a bit low, we should try to overshoot if anything, but it shouldn't be an issue DRV8825 needs 2us so at 5us it is definitely enough.

ejtagle commented 6 years ago

Yes, rounding is using a truncating operator. But it is pretty close to the calculations

p3p commented 6 years ago

By reasonable I mean it is getting longer for higher values so not the cause of this issue (more than likely) but it is still an issue, if a user sets the appropriate MINIMUM_STEPPER_PULSE for their drivers it will not be sufficient for all of them, E steppers will be getting even shorter pulses than Z more than likely.

ejtagle commented 6 years ago

Let me explain; The timing code uses part of the execution time as delay. The difference you see is caused by that execution time delay. The difference in timing between axis is constant, does not increase if increasing pulse width

p3p commented 6 years ago

yes but the Z pulse is 1690ns when MINIMUM_STEPPER_PULSE = 2, this will not drive a stepper with a minimum pulse of 2us reliably

ejtagle commented 6 years ago

The alternative should be to round up. To be honest, the delay is placed at the proper place. I suspect the compiler could be moving code around the waiting point... (But at least on Due i did not see that)

p3p commented 6 years ago

have you measured the pulse durations on Due? is this a LPC176x specific problem?

ejtagle commented 6 years ago

No i didn´t measure it yet. But the ARM Cortex M3 core used is the same, so I expect minor differences

The idea originally was (stepper::stepper_pulse_phase_isr) that the START_PULSE macro takes more or less the same amount of time to execute as the STOP_PULSE macro. If that was the case, then the delays should compensate and the proper pulse width would be output. By your own measurements, seems the STOP_PULSE macro is faster than the START_PULSE, so they do not compensate, and the Z, E steppers are getting a slightly less than expected pulse width. C compiler tends to cache values in registers because on ARM reading from memory is slower than fetching the address and then reading the value from SRAM (static variables are MUCH slower in ARM, accessing then through pointers is much faster. But in AVR is exactly the opposite scenario)...

That could explain the timing assimetry

The fix could be to add that extra offset to the calculations, or enforce timing in 4 points instead of 2, to make sure low and high times are always enforced ...

ejtagle commented 6 years ago

To be honest, i was quite disappointed when i disassembled the ARM generated code for the stepper isr. 64 cycles for each stepper was too much. But that is exactly what it takes...

p3p commented 6 years ago

scope Indeed you can see the asymmetry, and longer delay before Z, bearing in mind it's 62.5ns per division so skewed a bit

p3p commented 6 years ago

Well.. as I have the probes setup I decided to investigate the spam button in octoprint causes skipping issue, this doesn't look right at all, why is the planner putting a cruise block at max feed rate, directly after a finished deceleration?

scope2 scope3

ejtagle commented 6 years ago

The difference you see there is caused by the execution speed of the macros START_PULSE and STOP_PULSE.

Basically, they can be written as

 delta_error  += advance_dividend; 
      if (delta_error>= 0) { 
        set step pin; 
        count_position += count_direction; \
      } 

And then, on stop:

  if (delta_error >= 0) { \
        delta_error -= advance_divisor;
       clear step pin; \
  }

When the ifs do not execute, then the timing of the other pulses change. The only way to "fix" the timing would be to add an "else" clause and compensate when the condition is not true, by adding a delay...

Regarding the full accelerated block after an slowdown, i absolutely agree that it should not happen. But only Octoprint is able to do that: Sending Gcodes directly does not produce the problem

I do suspect that a precise timing is required to cause this problem... I have dumped the blocks being queued and never saw it.... But it obviously exists...

Some possibilities i think of:

The first block becomes busy when the planner tries to join it to the next one. So the planner can´t update the executing block speed profile, but it does update the following ones

ejtagle commented 6 years ago

That last possibility could be the reason. Imagine the planner doing the whole plan, trying to merge a 1st block with a 2nd block. Once it is planned, it tries to update the first block, but it is unable to do it because the block now is executing... There is something to prevent that: That block is marked as RECALCULATE, and the stepper should not take it... Maybe there is a race condition and the bit is being cleared before the block is used.. or the order of calculations... let me see...

p3p commented 6 years ago

@kAdonis can you try commits from before and after the 10th, just in general try to narrow this down a bit.

kAdonis commented 6 years ago

@p3p okay, I'll try I was able to finish a 2 hour print with pulse width set to 6us

AnHardt commented 6 years ago

There are at least two small gaps where the ISR could jump in:

@@ -842,14 +842,13 @@ void Planner::reverse_pass_kernel(block_t* const current, const block_t * const

       const float new_entry_speed_sqr = TEST(current->flag, BLOCK_BIT_NOMINAL_LENGTH)
         ? max_entry_speed_sqr
         : MIN(max_entry_speed_sqr, max_allowable_speed_sqr(-current->acceleration, next ? next->entry_speed_sqr : sq(MINIMUM_PLANNER_SPEED), current->millimeters));
       if (current->entry_speed_sqr != new_entry_speed_sqr) {
-        current->entry_speed_sqr = new_entry_speed_sqr;
-
         // Need to recalculate the block speed
         SBI(current->flag, BLOCK_BIT_RECALCULATE);
+        current->entry_speed_sqr = new_entry_speed_sqr;
       }
     }
   }
 }

@@ -905,18 +904,18 @@ void Planner::forward_pass_kernel(const block_t* const previous, block_t* const
       const float new_entry_speed_sqr = max_allowable_speed_sqr(-previous->acceleration, previous->entry_speed_sqr, previous->millimeters);

       // If true, current block is full-acceleration and we can move the planned pointer forward.
       if (new_entry_speed_sqr < current->entry_speed_sqr) {

+        // We need to recompute the trapezoidal shape
+        SBI(current->flag, BLOCK_BIT_RECALCULATE);
+
         // Always <= max_entry_speed_sqr. Backward pass sets this.
         current->entry_speed_sqr = new_entry_speed_sqr; // Always <= max_entry_speed_sqr. Backward pass sets this.

         // Set optimal plan pointer.
         block_buffer_planned = block_index;
-
-        // And mark we need to recompute the trapezoidal shape
-        SBI(current->flag, BLOCK_BIT_RECALCULATE);
       }
     }

     // Any block set at its maximum entry speed also creates an optimal plan up to this
     // point in the buffer. When the plan is bracketed by either the beginning of the
kAdonis commented 6 years ago

@p3p After a lot of testing I found out, the Issue was introduced in commit 6f14bca

Testing was more difficult than I thought I had to apply the workaround for the PlatformIO linker issue #11008 and there were compiler errors with TMC2130 library 2.4 so I needed to install version 2.3

p3p commented 6 years ago

@kAdonis Well that certainly narrowed it down

@ejtagle indeed before that commit the pulse duration were always more than the specified minimum, @ 2us pulse setting X 2600ns, Y 2400ns Z 2200ns, after the commit they are much lower X 1600ns, Y 1400ns, Z 1200ns, they must have been improved in another commit after that but we still need to update to be always above the minimum on the all axis.

@kAdonis I'm still unsure why you would need to go all the way up to 6us pulse duration though, even 3us should have fixed it if it is just this problem.

kAdonis commented 6 years ago

Yes, its a mystery maybe my drivers are from a bad charge?