[BUG] Z2 Stops at random times during print

lornem commented 3 years ago

Did you test the latest `bugfix-2.0.x` code?

Yes, and the problem still exists.

Bug Description

Sometimes I can get through 3 or 4 prints without any issue and then it might happen twice in a row. The extruder stepper motor also fails at random times but less often. I am pretty sure it is not related to heat as it may fail in the first 10 minutes or 45 minutes later. I checked TMC debug (M122) after it has failed but could not see anything. It never happens when starting the printer homing, aligning the z-axis or bed levelling only during a print.

Bug Timeline

It has always happened but less so on older versions of marlin (2.0.7)

Expected behavior

No response

Actual behavior

No response

Steps to Reproduce

It is very random so hard to reproduce, it happened twice in a row with the latest bugfix version.

Version of Marlin Firmware

2.0.7 and above

Printer model

Custom build (Prusa style frame)

Electronics

SKR 1.4 Turbo with TMC5160 drivers

Add-ons

No response

Bed Leveling

UBL Bilinear mesh

Your Slicer

Prusa Slicer

Host Software

No response

Additional information & file uploads

config.zip

Tannoo commented 3 years ago

Z2 just stops or is it losing steps?

Z1 is ok?

Have you tried moveing it outside a print? (G0 Z10 F300 and G0 Z20 F3000)

lornem commented 3 years ago

Z2 stops the print continues with the Z1 behaving normaly. after it fails I when I stop the print the motor will not respond until I power cycle the printer. I tried issuing the enable & disable commands to try to get it to move with out power cycling.

Tannoo commented 3 years ago

Sounds like a hardware issue. Are you using different drivers for Z1 and Z2? Try swapping the cables and see if the issues moves to the other motor.

lornem commented 3 years ago

Yes it did seem like a hardware issue but I have tried everything, different steppers, different wires etc. I waited 6 months to post this bug trying to find anything I could since it is so random. the main reason I did post it as a bug is due to the fact that it is much worse on versions above 2.0.7, when I used the bug fix version it failed twice in a row and now back on 2.0.7.2 I am on my third print without failure.

Tannoo commented 3 years ago

So, possibly with the current setting of the Z2? Or something with the TMC5160 setup?

I have the same SKR 1.4 Turbo but have TMC2209's. I only run one driver for both Z motors.

Have you tried running just one driver?

thinkyhead commented 3 years ago

When this occurs are you able to move the Z2 motor by hand? In other words, is it powered down? If you have DISABLE_INACTIVE_* or DISABLE_INACTIVE_EXTRUDER enabled, try disabling / setting those options to false to see if it has any effect. After testing with those off, try turning off all optional features to see if something else might be causing the issue. Also, see if turning SQUARE_WAVE_STEPPING on/off makes any difference.

lornem commented 3 years ago

Yes I can move it by hand it is powered down and cool compared to Z1 depending on when I noticed that it stopped. DISABLE_E is false, I will try commenting out DISABLE_INACTIVE_EXTRUDER and create a basic configuration to test with.

thinkyhead commented 3 years ago

Also be sure to disable DISABLE_INACTIVE_Z for your testing.

thinkyhead commented 3 years ago

Note that for testing DISABLE_INACTIVE_Z and DISABLE_INACTIVE_E you can simply do a single Z or E move, wait for the period defined by DEFAULT_STEPPER_DEACTIVE_TIME (try a very low value like 5 for quicker testing) and check that the motors are powered off, then do another Z or E move and check that the motors are now powered back on. If the Z2 motor is not powered on, that likely indicates a code problem.

You may get a similar effect just by sending M18 Z followed by M17 Z.

I see that the Z motors are using 1200 for their current. If that value is not absolutely needed to match the resistance of the motor coils, try 1000 or lower. This will help to prevent stepper drivers shutting down from overheating.

lornem commented 3 years ago

I set DISABLE_INACTIVE_Z to false and change the Z stepper current to 1000 but it still failed on the first print using the latest bugfix version. I needed to get some prints finished before doing this test and was able to get 14 small prints finished before I had a failure on version 2.0.7.2.

After the bugfix version failed I ran the TMC debug command and now I am wondering if it is Stallguard as I noticed an asterisk under Z2. Is Stallguard even active during a print or just for homing?

SENDING:M122 S0
axis:pwm_scale/curr_scale/mech_load|flags|warncount
        X   Y   Z   Z2  E
Enabled     false   false   true    true    true
Set current 760 1000    1000    1000    620
RMS current 754 990 990 990 612
MAX current 1063    1396    1396    1396    863
Run current 14/31   19/31   19/31   19/31   11/31
Hold current    7/31    9/31    9/31    9/31    5/31
Global scaler   135/256 133/256 133/256 133/256 138/256
CS actual   7/31    9/31    9/31    9/31    5/31
PWM scale   27  25  30  26  131102
vsense
stealthChop true    true    true    true    true
msteps      32  32  32  32  32
interp      true    true    true    true    true
tstep       max max max max max
PWM thresh. 79  79  658 658 60
[mm/s]      100 100 3   3   30
OT prewarn  false   false   false   false   false
triggered
 OTP        false   false   false   false   false
off time    3   3   3   3   3
blank time  24  24  24  24  24
hysteresis
 -end       -2  -2  -2  -2  -2
 -start     6   6   6   6   6
Stallguard thrs 2   2   0   0   0
uStep count 300 196 196 196 644
DRVSTATUS   X   Y   Z   Z2  E
sg_result   0   0   0   0   0
stallguard              *
fsactive
stst
olb
ola
s2gb
s2ga
otpw
ot
Driver registers:
        X   0x80:07:40:00
        Y   0x80:09:40:00
        Z   0x80:09:40:00
        Z2  0x81:09:40:00
        E   0x80:05:40:00
Testing X connection... OK
Testing Y connection... OK
Testing Z connection... OK
Testing Z2 connection... OK
Testing E connection... OK

lornem commented 3 years ago

I built a new version and enabled the stallguard sensitivity for Z and Z2 and it is reporting zero as it should since I am using a probe for Z but now I am wondering if the asterisk's are just under the wrong columns or if they have a different meaning. I assume since this is before printing it is stating that it is enabled for X & Y not Z & Z2

Stallguard thrs 2   2   0   0   0
uStep count 724 524 388 28  1020
DRVSTATUS   X   Y   Z   Z2  E
sg_result   0   0   0   0   0
stallguard          *   *

(when I paste the status test in the spaces are lost, in the console the asterisk's are under Z & Z2)

lornem commented 3 years ago

I set the Z_STALL_SENSITIVITY and Z2_STALL_SENSITIVITY just to see if it eliminated the issue and have printed 3 items successfully on the bugfix version but have had the last two prints fail with the extruder stopping mid-print on both.

thinkyhead commented 3 years ago

(when I paste the status test in the spaces are lost, in the console the asterisk's are under Z & Z2)

We are typing Markdown in these GitHub comment boxes, not plaintext. Use Markdown formatting to get the style of text you want. I have edited your comments above to add the correct formatting characters.

thinkyhead commented 3 years ago

Stallguard even active during a print or just for homing?

Stallguard is only used during homing, when it changes the state of a DIAG pin or gives feedback over SPI/Serial. During a print the drivers will shut themselves down in response to overheating or in response to bad DIR signals, when it triggers the stst (standstill) flag. See if it helps to put a stronger fan blowing on your stepper drivers' heatsinks.

lornem commented 3 years ago

After failure X temperature prewarn triggered: false Y temperature prewarn triggered: false Z temperature prewarn triggered: false Z2 temperature prewarn triggered: false E temperature prewarn triggered: false

lornem commented 3 years ago

Looks like the issue may be resolved, I put a larger fan over the TMC drivers (Thanks thinkyhead). I also on my bug fix version I forgot to copy the lower current settings for the Z motors so that explains why it was happening more often on that version.

I am wondering why the TMC drivers shut down but the prewarn was not triggered and there was nothing showing in the status for them, maybe this part is a bug.

lornem commented 3 years ago

It failed again after a few prints. here is the TMC Debug output just before I stoped the print.

>>> M122 S0
        X   Y   Z   Z2  E
Enabled     true    true    true    true    true
Set current 760 760 760 760 620
RMS current 754 754 754 754 612
MAX current 1063    1063    1063    1063    863
Run current 14/31   14/31   14/31   14/31   11/31
Hold current    7/31    7/31    7/31    7/31    5/31
Global scaler   135/256 135/256 135/256 135/256 138/256
CS actual   14/31   14/31   14/31   14/31   11/31
PWM scale   33488930    65574   393263  40  60
vsense
stealthChop true    true    true    true    true
msteps      32  32  32  32  32
interp      true    true    true    true    true
tstep       1169    3699    max max 10740
PWM thresh. 79  79  658 658 60
[mm/s]      100 100 3   3   30
OT prewarn  false   false   false   false   false
triggered
 OTP        false   false   false   false   false
off time    3   3   3   3   3
blank time  24  24  24  24  24
hysteresis
 -end       -2  -2  -2  -2  -2
 -start     6   6   6   6   6
Stallguard thrs 2   2   0   0   0
uStep count 1016    813 772 388 225
DRVSTATUS   X   Y   Z   Z2  E
sg_result   0   0   0   0   0
stallguard                  *
fsactive
stst        *   *           *
olb
ola
s2gb
s2ga
otpw
ot
Driver registers:
        X   0x00:0E:40:00
        Y   0x00:0E:40:00
        Z   0x80:0E:40:00
        Z2  0x80:0E:40:00
        E   0x01:0B:40:00
Testing X connection... OK
Testing Y connection... OK
Testing Z connection... OK
Testing Z2 connection... OK
Testing E connection... OK

thinkyhead commented 3 years ago

Looks like stst was triggered. That is sometimes caused by strange timing on the DIR signals. Were you using any form of bed leveling at the time of the failure?

lornem commented 3 years ago

It only fails partway into a print, it has never stoped while levelling the bed or the using z-axis levelling.

lornem commented 3 years ago

since I lowered the current and added the larger cooling fan it has only failed once so I am thinking that the drivers were shutting down due to heat, also knowing that it was Z2 AND E1 that would stop makes sense since they are the top drivers on the board so the heat from the other drivers would be rising to increase the heat on these too. The main issue is that the firmware is not reporting the issue just fails silently.

thinkyhead commented 2 years ago

The main issue is that the firmware is not reporting the issue just fails silently.

I agree, it would be a really good idea to STOP or KILL the machine whenever a standstill or overheated driver condition occurs, since at that point the print is almost certain to fail. Overheating steppers will quickly lead to skipped steps and shifted XY layers, so it seems sensible to stop the print under that condition. I wonder how other firmwares respond to stepper driver errors….

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

MarlinFirmware / Marlin