MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.22k stars 19.22k forks source link

[BUG] SKR 2: TMC connection error if `DISABLE_DRIVER_SAFE_POWER_PROTECT` commented-out (i.e. protection is enabled) #22701

Closed VanessaE closed 3 years ago

VanessaE commented 3 years ago

Did you test the latest bugfix-2.0.x code?

Yes, and the problem still exists.

Bug Description

In Configuration_adv.h, one finds this section:

/**
 * Stepper Driver Anti-SNAFU Protection
 *
 * If the SAFE_POWER_PIN is defined for your board, Marlin will check
 * that stepper drivers are properly plugged in before applying power.
 * Disable protection if your stepper drivers don't support the feature.
 */
//#define DISABLE_DRIVER_SAFE_POWER_PROTECT

This is the normal state. But, it also keeps TMC drivers from working at all.

At first I couldn't figure out why, then I looked at the SKR 2 schematic today (my board is a Rev. B and matches it):

Screenshot_2021-09-02_19-14-27

MOT_POWER is the control line from the µC, PGND is the power supply ground rail.

Screenshot_2021-09-02_19-13-37

Clearly, the driver power control MOSFET Q1 cuts the ground connection to the motor drivers when it's turned off. I mean, ALL of the driver modules' ground circuits, the lot of them, and not just whatever's needed for motor power (though I'm not sure if a TMC chip "splits" its grounding in this fashion, not that it matters)

No ground to the driver means unreliable UART comms, which means TMC connection errors (UART generally requires both +V and ground) or gibberish data being fed to the drivers.

It also means trying to turn a motor on can start to BURN UP the chip! I'm guessing here that the motor drive circuitry is trying to pull a ground reference through the rest of the driver chip from somewhere else inside it other than its primary ground rail. I am 100% certain that this is what killed three of my TMC2208 drivers.

In my not so humble opinion, this is a major hardware design flaw. However, that's out of Marlin's hands.

If I uncomment that line, the drivers work, UART works, etc., presumably because the ground connection is turned on by default in this case. However, my TMC2208's then can't be disabled - idle timeout and M84 do nothing on them (my two A4988's do turn off, however). I'm not sure why that is, but Marlin's doing it, as they start out turned off, and only turn on when Marlin finishes its boot logo animation.

My proposal:

Marlin appears to be calling upon the TMCstepper code until before it does its anti-SNAFU checks and has turned the power control circuit on.

So, don't try to query the drivers, and definitely do not allow any STEP/DIR/EN output to a driver that doesn't report all good, until after the anti-SNAFU code is happy and the Q1 power control circuit has been turned on, and thus chip has a valid ground reference.

If that means that some user can't print anymore because some other issue makes their drivers throw UART errors (that they've perhaps been ignoring), then so be it. They need to fix their hardware. If they're in a hurry, they could of course put those axes into A4988 or standalone mode.

Bug Timeline

Unknown timeline, but new to me since I'm working with new hardware.

Expected behavior

As described above.

Actual behavior

Bad UART comms causing TMC connection errors, high risk of driver burn-out.

Steps to Reproduce

No response

Version of Marlin Firmware

bugfix-2.0.x branch, commit 01d1192a

Printer model

Custom-built Hypercube

Electronics

brand new SKR v2, TMC2208's; no other electronics of significance

Add-ons

No add-ons of significance.

Bed Leveling

ABL Bilinear mesh

Your Slicer

Slic3r

Host Software

Pronterface

Additional information & file uploads

No response

thisiskeithb commented 3 years ago

This is the normal state. But, it also keeps TMC drivers from working at all.

This is not true.

I have several TMC-based SKR2 builds here that all work fine with DISABLE_DRIVER_SAFE_POWER_PROTECT commented out/disabled so the feature is enabled. I updated one on my bench to 682d6c9 an confirmed that it still works correctly with the latest bugfix-2.0.x code.

Have you tried replacing the board? Based on https://github.com/MarlinFirmware/Marlin/issues/22691, I suspect you may have damaged more than a single component.

VanessaE commented 3 years ago

I already talked to someone at BTT, they insist that the board is fine.

thisiskeithb commented 3 years ago

I already talked to someone at BTT, they insist that the board is fine.

I'd order a replacement. There is something not working correctly on your board.

VanessaE commented 3 years ago

But that doesn't make any sense - clearly Marlin's turning that MOSFET on and off. I can print, as long as that option is uncommented. Besides, this is a brand new board, only a couple of days old.

thisiskeithb commented 3 years ago

Besides, this is a brand new board, only a couple of days old.

Doesn't matter that a board is new out of the box. It's not working correctly. I suggest ordering a replacement.

VanessaE commented 3 years ago

I will do so, but let's leave this issue open.

thisiskeithb commented 3 years ago

I just updated my B1 SE & SE Plus (both running SKR2s) and this feature is working correctly.

I'm sure an exchange will get you sorted. 👍

ellensp commented 3 years ago

There has been a bunch of tmc stepper driver of late that where not cleaned properly. The flux residue causes all sort of problems with them. Are your tmc nice clean and and flux free?

VanessaE commented 3 years ago

Please see also, https://github.com/bigtreetech/SKR-2/issues/63#issuecomment-915323480

@thisiskeithb as can be seen in that comment, exchanging for a new board did not fix my issue. The new hardware behaves identically to what it replaced.

Can you please confirm if you're using 2208's in UART mode on those SKR 2's?

I stand by my theory that the protection feature is causing drivers to ground themselves through some unintended internal path other than their primary ground rails.

@ellensp My drivers were and are clean, or certainly clean enough that I'd expect them to work normally.

thisiskeithb commented 3 years ago

Can you please confirm if you're using 2208's in UART mode on those SKR 2's?

I've used TMC2225s, TMC2208s, TMC2226s, and TMC2209s - all in UART mode and they work fine.

VanessaE commented 3 years ago

@thisiskeithb All I know is this thing just killed three more brand spanking new 2208's, on a brand new SKR 2 (i.e. the replacement), in exactly the same manner as on the first SKR board. That makes 8 dead driver modules. Fortunately they're cheap enough to replace.

Can you please try testing 01d1192a with my config (Marlin-configs-2021-09-08.zip) and at least 3 or 4 2208's?

looxonline commented 3 years ago

@thisiskeithb All I know is this thing just killed three more brand spanking new 2208's, on a brand new SKR 2 (i.e. the replacement), in exactly the same manner as on the first SKR board. That makes 8 dead driver modules. Fortunately they're cheap enough to replace.

Can you please try testing 01d1192 with my config (Marlin-configs-2021-09-08.zip) and at least 3 or 4 2208's?

There must be something unique about your setup. I'm running SKR2 boards and have not had this issue. Remember that the high side and the low side are both switched so the only possible source would be the UART line and that would not likely have the power to cause any damage if the current tried to pass through the IC via an alternate path. It would likely just act as a high impedance with no power applied to the rest of the IC. I'm running SKR2s and have not had this issue at all.

looxonline commented 3 years ago

@thisiskeithb All I know is this thing just killed three more brand spanking new 2208's, on a brand new SKR 2 (i.e. the replacement), in exactly the same manner as on the first SKR board. That makes 8 dead driver modules. Fortunately they're cheap enough to replace.

Can you please try testing 01d1192 with my config (Marlin-configs-2021-09-08.zip) and at least 3 or 4 2208's?

Maybe you could try something to remove as many variables as possible. Power the board outside of any machine with driver(s) inserted and a clean PSU. Check what happens on the bench and then move from there.

thisiskeithb commented 3 years ago

In any event, this isn't a Marlin bug.


This Issue Queue is for Marlin bug reports and development-related issues, and we prefer not to handle user-support questions here. (As noted on this page.) For best results getting help with configuration and troubleshooting, please use the following resources:

After seeking help from the community, if the consensus points to a bug in Marlin, then you should post a bug report.

gzalo commented 3 years ago

Sucks to hear that!

Some possible causes things I'm thinking of:

VanessaE commented 3 years ago

@looxonline

There must be something unique about your setup

That was my first thought before I started yelling in here, but with today's tests/failures, we're talking about using only a power supply (genuine Mean Well SE-600-24), a handful of run-of-the-mill TMC2208 driver modules, and a 12864 display module.

I even avoided plugging-in the USB cable initially (only once it came time to look at M122).

I don't see how can it get any more generic than that.

I'm running SKR2 boards and have not had this issue.

But did you try the same combo of branch/commit, drivers, and motherboard version, with a build using my config files?

I'll share my copy of the source tree if you like, but I take no responsibility if my firmware.bin causes your board to fry a driver. :stuck_out_tongue:

Check what happens on the bench and then move from there.

That's effectively what I did.

Since the only 24v supply I have available to use for this is the one in the printer, the printer was the "bench" during these tests.

I'm not sure it matters, though, because at the end of that first round of testing, before I sent the first SKR v2 back, I reconfigured to replace the two definitely-dead 2208's with a couple of old A4988's, just to see what would happen. I figured if one of those dies, whoop-de-doo, I have several more.

That got the machine working enough to print a Benchy as a sanity check (but the remaining 2208's were on X/Y and were damaged as well, so the quality was mediocre).

Note that I was especially careful to avoid even the smallest change to the rest of the setup when I swapped those A4988's in. Fewest variables, and all that.

Remember that the high side and the low side are both switched so the only possible source would be the UART line and that would not likely have the power to cause any damage

One wouldn't think so, but here we are. Those three drivers crapped-out today without my even trying to move their respective axes. All I had to do was plug them in, turn on the power, and wait. And yes, I plugged them in properly.

@thisiskeithb

In any event, this isn't a Marlin bug

Either Marlin is misusing the SKR v2's hardware in such a way that even a Rev. B will burn-up TMC drivers under certain conditions, or the board has a major design flaw.

In either case, since the motherboard's design is basically set in stone, isn't it the job of the firmware to work-around the motherboard's flaws, or to at least throw errors at compile-time if there's a risky combo of settings?

Besides, since it could print once I swapped in those A4988's as mentioned above, that proves beyond any doubt that everything else about the machine works fine, and that I did not make any mistakes in the rest of the hardware setup (certainly nothing that should lead to a critical failure, otherwise I would think those A4988's would have burned-out too).

Also, didn't anyone look at my configs to double-check my work? If my configs are reasonable, and two brand new sets of hardware failed in the same ways, how can I point to anything BUT Marlin here?

@gzalo

stepper_driver_safety writes to some pins (as outputs) when the supply and ground to the steppers is effectively disconnected. Maybe some stray current is killing the digital part of the TMCs?

That's my theory as well.

the SAFE_POWER_PIN connects both the ground and vmotor using different methods. Maybe it has some us of slight delay, and thus Marlin could end up trying to send commands before everything has settled?

That's possible. I brought up a similar idea earlier.

are the drivers robust enough to be powered on without connecting them to the motors?

I would think so, but since I never tried to move any axes for this round of tests, the drivers shouldn't turn on yet anyway, and besides that, when I was working with the first board/drivers, I had the motors connected then.

I remember I burned a few TMC drivers when using a ramps and it supplied logic voltage without the motor power. Maybe something similar happens when connecting the board through USB?

Not possible with the SKR boards -- they have a jumper that feeds 5v/3.3v either from USB or 12/24v in, but never both at the same time. Either way, I never had that kind of thing happen on my old 2560/RAMPS, nor on SKR v1.1, v1.3, or v1.4. Sure, boards fail and whatnot, but I don't recall having ever burned out even one driver module since I started back in 2016, let alone this.

VanessaE commented 3 years ago

@thisiskeithb I'm sure you mean well, but please... stop with that "this is not a help forum" boilerplate. That kind of reply is NOT helpful at all.

I get it, this is a bug tracker (among other things), not a general forum, but that kind of reply is barely more than a "you're wrong, now go away" sort of response, and it assumes that you can't be wrong, that no one else "in the know" has an a solution, and that I didn't do my due diligence.

In other words, it's written for n00bs, which I absolutely am not.

Plus, since when has it ever been proper to look for consensus outside of a bug tracker?

I'm wrong a lot, I'll admit, but to give that sort of reply after I wasted a bunch of my time and (less-so) money trying to make this work... to put it politely, I'm less than pleased.

thisiskeithb commented 3 years ago

Either Marlin is misusing the SKR v2's hardware in such a way that even a Rev. B will burn-up TMC drivers under certain conditions, or the board has a major design flaw.

I don’t know if you just had bad luck or have a very specific combination of firmware settings and physical hardware/wiring/etc., but this if this feature did not work as intended, then there would be a lot more complaints in regards to TMC2225s/2208s on the SKR2 Rev B. since that combination is used in the Biqu B1 SE and SE Plus and not to mention all of the boards currently in use with TMC drivers.

Like I mentioned above, there are other places to seek help with your hardware issues.

VanessaE commented 3 years ago

very specific combination of firmware settings and physical hardware/wiring/etc

Again, it's just a power supply, drivers, and the display module. Two thick DC wires for the power and two ribbons for the display. That's it, unless you're counting the mains power cord and/or the USB cord.

And I shared my firmware settings. Don't want to look? Fine, at least let someone else have a crack at it, rather than brushing me off.

[...] there would be a lot more complaints in regards to TMC2225s/2208s on the SKR2 Rev B. [...]

Except there have been some complaints, for example:

https://github.com/bigtreetech/SKR-2/issues/63

https://www.reddit.com/r/MarlinFirmware/comments/ontmdo/tmc_connection_error_skr_2_tmc_2209/

https://www.reddit.com/r/BIGTREETECH/comments/o5hnr3/btt_skr_2_rev_b_tmc_2209_driver_backwards/

since that combination is used in [...]

And how many of them have the protection feature enabled, and are running in UART mode?

Like I mentioned above, there are other places to seek help with your hardware issues.

Again with the dismissive attitude.

What's to help? Marlin's doing something wrong here, and I have spelled it out in every possible way. I didn't invent the SKR v2 so it's not like I can fix the design.

It works with dusty old A4988's that I've had since the last ice age, but not with brand new TMC2208's that have barely dried out from the fab. How is this "my hardware issue", and not something Marlin is or isn't doing?

VanessaE commented 3 years ago

...and, I just re-checked:

if I uncomment DISABLE_DRIVER_SAFE_POWER_PROTECT and compile, I can put my last two 2208's (which weren't killed in the last rounds) on the board, and they seem to be willing to work - motion and UART seem right, spew from M122 is quick and the drivers both report good, and they only get a bit warm when driving their respective motors.

That is, their behavior seems to be completely normal with that line uncommented.

thinkyhead commented 3 years ago

Marlin appears to be calling upon the TMCstepper code until before it does its anti-SNAFU checks and has turned the power control circuit on.

I suggest enabling MARLIN_DEV_MODE for some additional log output during setup() so we can see the exact point in the startup procedure where certain things are done, and then perhaps you can try moving this block to different points within setup() to see if it makes any difference:

  #if PIN_EXISTS(SAFE_POWER)
    #if HAS_DRIVER_SAFE_POWER_PROTECT
      SETUP_RUN(stepper_driver_backward_check());
    #else
      SETUP_LOG("SAFE_POWER");
      OUT_WRITE(SAFE_POWER_PIN, HIGH);
    #endif
  #endif

I've examined your configs to see if anything stands out, and nothing seems problematic. I was curious about all three serial ports being in use, and wondering if any of those could be stepping on the TMC UART. You might try disabling EEPROM_SETTINGS, MONITOR_DRIVER_STATUS, TMC_DEBUG, STEALTHCHOP_*, and HYBRID_THRESHOLD options as part of testing to see if any of those could be involved.

Of course, be careful with all these things in testing.

Have you checked the current from your PSU to make sure it is stable and correct? I'm sure the board can handle more than 24V but it can't hurt to double-check.

Also, try enabling PINS_DEBUGGING and running M43 to make sure we don't have any odd pin conflicts.

The important thing is to narrow this down and determine the most direct cause. I don't see anything that looks potentially harmful in stepper_driver_backward_check itself, although it does leave the *_ENABLE_PINs in INPUT state briefly, just until stepper.init() is called. I don't know if that could cause any trouble. The settings are loaded from EEPROM / defaults after the backward-check but before stepper.init() so it would be good to make sure nothing in settings.load() could mess up TMC drivers either.

We can continue to troubleshoot over on Discord until we have a fix on the exact cause of your troubles.

VanessaE commented 3 years ago

perhaps you can try moving this block to different points within setup()

I'd try that and the other tests you mentioned, but I'm kinda low on TMC2208's, and this issue burns them out quickly (seconds to minutes), if it's allowed to happen.

I've examined your configs to see if anything stands out, and nothing seems problematic.

That's a relief.

I was curious about all three serial ports being in use, and wondering if any of those could be stepping on the TMC UART

I found that in BTT's default config, but I didn't put it into place until after the first drivers died. It was only for testing, but to be safe, I've disabled the third port.

I doubt there's connection there though, since TMC2208 UART is bit-banged over GPIO, one pin per driver slot, rather than using a µC-managed serial bus like 2130's do.

I suggest enabling MARLIN_DEV_MODE Also, try enabling PINS_DEBUGGING and running M43

Both are now enabled. The result of the latter is:

Show output... ``` >>> m43 SENDING:M43 PIN: PA8 M42 P8 Z_DIR_PIN protected PIN: PA9 M42 P9 Alt Function: 7 - USART1..3 PIN: PA10 M42 P10 Alt Function: 7 - USART1..3 PIN: PA11 M42 P11 Alt Function: 10 - OTG PIN: PA12 M42 P12 Alt Function: 10 - OTG PIN: PA13 M42 P13 Alt Function: 0 - system (misc. I/O) PIN: PA14 M42 P14 Alt Function: 0 - system (misc. I/O) PIN: PA15 M42 P15 Z_STEP_PIN protected PIN: PB2 M42 P18 BTN_EN2 Input = 1 PIN: PB3 M42 P19 HEATER_0_PIN protected PIN: PB4 M42 P20 HEATER_1_PIN Output = 0 PIN: PB5 M42 P21 CONTROLLER_FAN_PIN protected . FAN2_PIN protected PIN: PB6 M42 P22 E0_AUTO_FAN_PIN protected . FAN1_PIN protected PIN: PB7 M42 P23 FAN_PIN protected PIN: PB8 M42 P24 Input = 1 PIN: PB9 M42 P25 Input = 1 PIN: PB10 M42 P26 ESP_WIFI_MODULE_GPIO0_PIN Output = 1 PIN: PB11 M42 P27 Input = 1 PIN: PB12 M42 P28 Input = 1 PIN: PB13 M42 P29 Input = 1 PIN: PB14 M42 P30 Input = 1 PIN: PB15 M42 P31 Input = 0 PIN: PC6 M42 P38 E0_SERIAL_TX_PIN Input = 1 . E0_SERIAL_RX_PIN Input = 1 PIN: PC7 M42 P39 E0_ENABLE_PIN protected PIN: PC8 M42 P40 Alt Function: 12 - FSMC, SDIO, OTG PIN: PC9 M42 P41 Alt Function: 12 - FSMC, SDIO, OTG PIN: PC10 M42 P42 Alt Function: 12 - FSMC, SDIO, OTG PIN: PC11 M42 P43 Alt Function: 12 - FSMC, SDIO, OTG PIN: PC12 M42 P44 Alt Function: 12 - FSMC, SDIO, OTG PIN: PC13 M42 P45 Output = 1 PIN: PC14 M42 P46 ESP_WIFI_MODULE_RESET_PIN Output = 1 PIN: PC15 M42 P47 POWER_LOSS_PIN Input = 1 PIN: PD0 M42 P48 Z_SERIAL_TX_PIN Input = 1 . Z_SERIAL_RX_PIN Input = 1 PIN: PD1 M42 P49 Z_ENABLE_PIN protected PIN: PD2 M42 P50 Alt Function: 12 - FSMC, SDIO, OTG PIN: PD3 M42 P51 Y_SERIAL_TX_PIN Output = 1 . Y_SERIAL_RX_PIN Output = 1 PIN: PD4 M42 P52 Y_DIR_PIN protected PIN: PD5 M42 P53 Y_STEP_PIN protected PIN: PD6 M42 P54 Y_ENABLE_PIN protected PIN: PD7 M42 P55 HEATER_BED_PIN protected PIN: PD8 M42 P56 Alt Function: 7 - USART1..3 PIN: PD9 M42 P57 Alt Function: 7 - USART1..3 PIN: PD10 M42 P58 E1_DIR_PIN Output = 0 PIN: PD11 M42 P59 E1_STEP_PIN Input = 0 PIN: PD12 M42 P60 E1_CS_PIN Input = 1 . E1_SERIAL_TX_PIN Input = 1 . E1_SERIAL_RX_PIN Input = 1 PIN: PD13 M42 P61 E1_ENABLE_PIN Output = 1 PIN: PD14 M42 P62 E0_DIR_PIN protected PIN: PD15 M42 P63 E0_STEP_PIN protected PIN: PE0 M42 P64 X_SERIAL_TX_PIN Output = 1 . X_SERIAL_RX_PIN Output = 1 PIN: PE1 M42 P65 X_DIR_PIN protected PIN: PE2 M42 P66 X_STEP_PIN protected PIN: PE3 M42 P67 X_ENABLE_PIN protected PIN: PE4 M42 P68 Z_MIN_PROBE_PIN protected PIN: PE5 M42 P69 SERVO0_PIN Output = 0 PIN: PE6 M42 P70 NEOPIXEL_PIN Input = 0 PIN: PE7 M42 P71 BTN_EN1 Input = 1 PIN: PE8 M42 P72 Input = 0 PIN: PE9 M42 P73 LCD_PINS_RS Output = 0 PIN: PE10 M42 P74 LCD_PINS_D4 Output = 1 PIN: PE11 M42 P75 LCD_PINS_D5 Input = 1 PIN: PE12 M42 P76 LCD_PINS_D6 Input = 1 PIN: PE13 M42 P77 BTN_ENC_EN Input = 1 . LCD_PINS_D7 Input = 1 PIN: PE14 M42 P78 Output = 1 PIN: PE15 M42 P79 Input = 1 PIN: PH0 M42 P80 Input = 0 PIN: PH1 M42 P81 Input = 0 ```

Nothing jumps out at me.

Have you checked the current from your PSU to make sure it is stable and correct? I'm sure the board can handle more than 24V but it can't hurt to double-check.

While I'd need a clamp-on ammeter to check the current properly (my regular meter wouldn't handle the load), it does seem to be fine on the surface. Voltage is a steady 24v when under load -- last night I heated up the bed and hotend without trouble just for a test. Between those and the lighting, at the very least the PSU is solid at 400 or so watts (which is probably the most the machine can actually pull).

although it does leave the *_ENABLE_PINs in INPUT state briefly, just until stepper.init() is called

I thought about this last night actually. While I struggled to understand the code (I don't know Marlin's codebase at all, and C/C++ is not exactly my forte), it is at the very least leaving all four used driver slots in that state before stepper.init() comes back around. I wonder if that isn't creating a current leak that gets exploited by the UART code when it goes to twiddle its respective lines?

I'd test, but again, low on 2208's.

it would be good to make sure nothing in settings.load() could mess up TMC drivers

If it helps, before putting drivers on the new board, I made a point to do M502, M500 just to make sure the EEPROM was clean (just in case manufacturer testing left something behind).

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.