bigtreetech / SKR-2

192 stars 179 forks source link

Rev A and Rev B boards, anti reversal protection using TMC drivers #15

Open b-desconocido opened 3 years ago

b-desconocido commented 3 years ago

Hi, I have several questions regarding recent TMC drivers mess. 1) If I want to use TMC drivers, I MUST disable anti-reversal protection in firmware, whatever version of board I have, correct? 2) The hardware on Rev A board still need to be fixed even though anti-reversal protection is disabled in firmware, correct? 3) Does this issue affect any other kind of drivers or it is TMC-specific? 4) How exactly it is damaging step sticks? Is it due to high voltage drop on the MOSFET (so the motors get 'higher' level of GND voltage), bad timings, high voltage spikes, MCU isn't able to fully open MOSFET or what? I'm not an engineer, but still curious what is the cause. 5) Can it damage the board itself? 6) According to schematic exactly same MOSFETs are used to control heaters and bed. Should I worry about that? 7) What is the most preferred and bulletproof method of fixing the board - botched wire/resistor or transistor replacement? Thank you in advance.

No answers are available for all of my questions on Biqu's google drive, so please, don't close this issue immediately. PS I hope I didn't destroy all of my drivers if any at all, but can't check them now.

Rodov commented 3 years ago
  1. Disable for rev.A
  2. Correct
  3. Any drivers, uses sensorless (with extra pins)
  4. MOSFET replacement or a board itself, but jumper is works fine too. https://youtu.be/SjVkqXaQKtc
dzhaparoff commented 3 years ago
  1. According to schematic exactly same MOSFETs are used to control heaters and bed. Should I worry about that?

I'm very concerned about that MOSFETs too.

b-desconocido commented 3 years ago

I'm very concerned about that MOSFETs too.

Could you please check part numbers or take a photo? I'm quiet far from home now and can't look at my board now - holidays :) If MCU can't fully open a MOSFET, they must start RMA procedure right now. In that case botched wire at antireversal circuitry won't prevent board from catching a fire while it heating a bed or a hotend.

b-desconocido commented 3 years ago
  1. Disable for rev.A
  2. Correct
  3. Any drivers, uses sensorless (with extra pins)
  4. MOSFET replacement or a board itself, but jumper is works fine too.

Their workaround suggests to disable anti reversal for both versions of pcb, so it requires clarification from official representative. Why do we need to disable a feature in firmware if it is fixed on Rev b board?

dzhaparoff commented 3 years ago

I'm very concerned about that MOSFETs too.

Could you please check part numbers or take a photo? I'm quiet far from home now and can't look at my board now - holidays :) If MCU can't fully open a MOSFET, they must start RMA procedure right now. In that case botched wire at antireversal circuitry won't prevent board from catching a fire while it heating a bed or a hotend.

tomorrow) i'm not at home too)

b-desconocido commented 3 years ago

There is also a typo in schematic. Q2 most be a P-channel MOSFET, but they used a wrong symbol (or am I wrong?). It switches high side for always-on fans, heaters and motors. Q1 connects PGND to MGND. Both transistors are controlled directly by MCU.

But there is an issue. MOT_POWER is connected to PC13, which is driven by RTC power domain and it has very limited current capability. STM don't recommend to use it that way. It should not be used directly to control MOSFET and optocoupler simultaneously, it might not supply enough power to open a Q2 transistor, which might lead to a disaster. I'm just an amateur hobbyist, so correct me if I wrong.

b-desconocido commented 3 years ago

So, here is my theory: 1) MOT_Power is connected to PC13, this pin is located in RTC power domain. 2) It drives both an optocoupler's Led and gate of the Q1 3) Due to lower power output capability, it is unable to open transistor Q1 - the voltage is way too low or dv/dt is low. This might damage drivers if MGND is higher than IO voltage. This might also blow up Q1 and MCU! 4) You should use some kind of gate driver or use another pin with higher current sinking/sourcing capability. 5) Please, read documentation provided by STM 6) The board should be redesigned

Is there someone competent to prove or debunk this theory? Cuz I'm bad at electronics. And English.

b-desconocido commented 3 years ago

PC13, PC14, PC15 and PI8 are supplied through the power switch. Since the switch only sinks a limited amount of current (3 mA), the use of GPIOs PC13 to PC15 and PI8 in output mode is limited: the speed should not exceed 2 MHz with a maximum load of 30 pF and these I/Os must not be used as a current source (e.g. to drive an LED).

You're driving a LED You're close to or over the 3ma current limit You're driving a MOSFET

There is no 5V buffer or level shifter for Q1 connected to the weakest pin of MCU. But you've used 5V buffers for LED_PWM and heaters (thanks God). Now I'm sure it is a design flaw, not supply shortage or any other BS. Probably this anti reversal circuitry was added as another gimmick feature at the last moment, no thoughtful process involved. Waiting for revision C and board replacement.

Screenshot_20210502-013237

dzhaparoff commented 3 years ago

I'm very concerned about that MOSFETs too.

Could you please check part numbers or take a photo? I'm quiet far from home now and can't look at my board now - holidays :) If MCU can't fully open a MOSFET, they must start RMA procedure right now. In that case botched wire at antireversal circuitry won't prevent board from catching a fire while it heating a bed or a hotend.

Exact same MOSFETs used for hotend heaters (G090N06). But Heatbed MOSFET is different (G017N04).

DSC01923 DSC01925 DSC01927

b-desconocido commented 3 years ago

Exact same MOSFETs used for hotend heaters

These are driven by 5V logic. The one is used in antireversal circuitry is not due to bad design

dzhaparoff commented 3 years ago

Exact same MOSFETs used for hotend heaters

These are driven by 5V logic. The one is used in antireversal circuitry is not due to bad design

But hotend MOSFETs are also different from schematics. In schematics HY1904C2 is used.

b-desconocido commented 3 years ago

Exact same MOSFETs used for hotend heaters

These are driven by 5V logic. The one is used in antireversal circuitry is not due to bad design

But hotend MOSFETs are also different from schematics. In schematics HY1904C2 is used.

Mate, there is inherent error in board design. Yes, they used a different manufacturer MOSFETs, but they are within specs. Those specs might deviate from batch to batch for the same manufacturer. The problems are following: 1) BTT omitted 5V+ buffers for Q1, although they used buffers for heaters/fans and even LEDs (lol) 2) They connected gate of Q1 + optocoupler directly to MCU (3.3V power supply), using weakest GPIO pin possible. It just can't supply enough current to open MOSFEST fast and reliably, and that what STM's datasheet explicitly says (see my replies above). They ignored documentation and proceed with dumb decision 3) Both types of MOSFETs are rated same, up to 3V gate threshold voltage. They must use gate drivers, 5V level shifters or buffers to drive Q1. 4) The gate voltage ramp might be too slow, or not enough to open MOSFETS even statically! That is a design flaw, even for MOSFETs used in schematic. 5) MOSFET replacement won't solve the issue! There is a chance you will destroy stepper driver, Q1, MCU and anything else connected to this board. 6) All other switches are fine, as BTT driving their gates using 5 volts 7) Q2 is also slightly concerning as voltage divider has way to high resistance. That might not be a problem, though. 8) The only reliable solution is to connect MGND to PGND with wire, bypassing the Q1 transistor

dzhaparoff commented 3 years ago

@b-desconocido, about hotend mosfets – it's just my observation. I totally agree with you!

b-desconocido commented 3 years ago

@b-desconocido, about hotend mosfets – it's just my observation. I totally agree with you!

I'm curious how did they fix this issue in revision B board. If they just switched to other transistor, it wouldn't help much. Neither replacing the Q1 transistor on rev A board. That's very concerning

EsserPrototyping commented 3 years ago

Speaking of 3V to 5V for the "LED" ..technically it´s good to step up the logic to 5V for driving WS-Led´s, but I think it´s the wrong part - at least in the schematic.. and this is carried over from the turbo. The 74LVC1G125 can be used with 5V, but from specs it requires 0.7 × VCC (3.5V when VCC = 5V) for a logic HIGH. The better part is the 74AHCT1G125.. runs on 5V, and requires 2V for a HIGH. Maybe it´s working mostly, but it´s not a clean solution. However.. this has nothing to do with the problem here. Thanks for your investigations @b-desconocido .. I wish they never touched the "driver reverse stuff" - if you plug in drivers the wrong way, you probably should not handle those electronics anyway. One thing I ´m also not sure about: The TMC2130 for example need´s to have VMOT enabled before logic VCC.. how would this work if VMOT is switched on later by the MCU? Maybe it´s not a problem, but could be another trap.

However I´m very happy with my other SKR´s and I hope BTT will handle this issue seriously.. better trashing this and doing an SKR 2.1 before the "mosfet fix" is not 100% working.

b-desconocido commented 3 years ago

@EsserPrototyping they can sell remaining boards with a jumper resistor instead of Q1, calling it "SKR 2 Lite", remove antireversal feature from their advertisements, etc. You're definately right, there might be some other power related quirks. Mistakes are made, we're just humans, after all.

antoniolr95 commented 3 years ago

Would it have been possible to use U5 optocoupler to control Q1 and Q2 gates?

b-desconocido commented 3 years ago

@antoniolr95 no, you can't connect gates of Q1 and Q2. Q1 is N channel, Q2 (the symbol on schematic is wrong) is P channel, you will blow up Q1.

looxonline commented 3 years ago

Exact same MOSFETs used for hotend heaters

These are driven by 5V logic. The one is used in antireversal circuitry is not due to bad design

But hotend MOSFETs are also different from schematics. In schematics HY1904C2 is used.

Mate, there is inherent error in board design. Yes, they used a different manufacturer MOSFETs, but they are within specs. Those specs might deviate from batch to batch for the same manufacturer. The problems are following:

  1. BTT omitted 5V+ buffers for Q1, although they used buffers for heaters/fans and even LEDs (lol)
  2. They connected gate of Q1 + optocoupler directly to MCU (3.3V power supply), using weakest GPIO pin possible. It just can't supply enough current to open MOSFEST fast and reliably, and that what STM's datasheet explicitly says (see my replies above). They ignored documentation and proceed with dumb decision
  3. Both types of MOSFETs are rated same, up to 3V gate threshold voltage. They must use gate drivers, 5V level shifters or buffers to drive Q1.
  4. The gate voltage ramp might be too slow, or not enough to open MOSFETS even statically! That is a design flaw, even for MOSFETs used in schematic.
  5. MOSFET replacement won't solve the issue! There is a chance you will destroy stepper driver, Q1, MCU and anything else connected to this board.
  6. All other switches are fine, as BTT driving their gates using 5 volts
  7. Q2 is also slightly concerning as voltage divider has way to high resistance. That might not be a problem, though.
  8. The only reliable solution is to connect MGND to PGND with wire, bypassing the Q1 transistor

Unfortunately many of these statements are not true. I'm an electronic engineer and have been designing products for 15 years. I still make mistakes with designs so there may be some points below which I am wrong on but to the best of my knowledge and experience here are some pointers:

1.) 5V buffers are simply not required for Q1 since it is a trench FET. It has a max Vgs of 3V which ensures that it will always be past the pinch off point and acting as a switch. Figure 5 of the datasheet shows that even with a current of 10A passing through it at that Vgs there would only be around 0.1V drop across Vds. The reason that a buffer is used on the other FETs is because they are made up of a variety of different types. Some need a higher Vgs as they are not trench. Since they were going to put a buffer in they no doubt decided to make the most use of it and connect it to the remaining FETs. This would likely also help with switching speeds on PWM signals.

2.) The IO pin is more than capable of supplying the needed current. The optocoupler base is limited by the 1k resistor which, when considering the 1V drop, means that it will be draining about 2.3mA. The FET gate will drain nothing at all since it is a voltage driven device. There may be a touch of inrush current in order to charge the pF worth of capacitance at the GS junction but that would still be insignificant.

3.) This is incorrect. Again, they are trench. These are specifically designed so that they can be switched using standard 3V3 CMOS IO pins which is why they have a max Vgs of 3V.

4.) There is no voltage ramp time involved. There is no capacitive load on the gate and therefore it will switch as soon as a voltage is applied. I will gladly prove this on a scope later today.

5.) This is a very bold statement with little to back it up. The alternate FET which was causing the issues simply had too high an RDS and therefore negative voltages could be presented to the driver which it's internal buffers could not handle. Using a few with a lower Rds resolves this.

6.) See earlier point on this.

7.) I am not sure why you say that the divider has too high a resistance. Are you worried about noise? If so then that is not a concern at all given that the divider is being driven by 1mA. When you consider the divider values you will see that they have been specifically selected to keep wasted current low while producing the volt drop required to drive Q2. They did use the incorrect symbol for Q2 which I informed them about some weeks back.

8.) No, using the correct FET is a reliable solution.

b-desconocido commented 3 years ago

@looxonline I'm glad someone experienced is here. So, pin selection was smart, there is nothing wrong with schematic and replacing a MOSFET is a perfect solution? Because 3 mA limit is not per pi PC13n, it is for entire RTC domain. It might be enough to drive a CMOS logic, but the output level might be well below Vth, which is 3V for both for both, wrong and correct mosfets. I though a bunch of bjt or 5v buffer are realibly solving any issues. Alternatively, using strong pull up and open drain mode for this pin. Could you please measure voltage at the gate?

Portzal commented 3 years ago

I think I will be going for the removal of the MOSFET and joining 7 of 8 pads with solder as per the self fix options. Along with firmware changes this should in my opinion be a solid fix.

looxonline commented 3 years ago

@looxonline I'm glad someone experienced is here. So, pin selection was smart, there is nothing wrong with schematic and replacing a MOSFET is a perfect solution? Because 3 mA limit is not per pi PC13n, it is for entire RTC domain. It might be enough to drive a CMOS logic, but the output level might be well below Vth, which is 3V for both for both, wrong and correct mosfets. I though a bunch of bjt or 5v buffer are realibly solving any issues. Alternatively, using strong pull up and open drain mode for this pin. Could you please measure voltage at the gate?

Even though 3mA is shared between those pins the other pins are not acting as a drain. One is an input and will not need to source or sink anything meaningful due to the large input resistance and the other is driving an input which means that it too will not need to source or sink anything meaningful. This stands even during the brief periods where the ESP is being reset where it will be sinking 3V3 to GND via a 12K which will only result in another 0.275mA being added to the budget.

On that basis the pin would be able to drive the gate up to the full 3V3 without any problems and as such the FET would have a hard turn on capable of handling all of the current that needs to flow through it without developing an Vds that would breach the -0.5V threshold of the drivers IO pins. This was really the issue with the other FET. It simply breached that threshold.

I'll pop the scope on a little later and post the traces. Got a busy evening until late so I may actually have to do it tomorrow evening.

looxonline commented 3 years ago

Speaking of 3V to 5V for the "LED" ..technically it´s good to step up the logic to 5V for driving WS-Led´s, but I think it´s the wrong part - at least in the schematic.. and this is carried over from the turbo. The 74LVC1G125 can be used with 5V, but from specs it requires 0.7 × VCC (3.5V when VCC = 5V) for a logic HIGH. The better part is the 74AHCT1G125.. runs on 5V, and requires 2V for a HIGH. Maybe it´s working mostly, but it´s not a clean solution. However.. this has nothing to do with the problem here. Thanks for your investigations @b-desconocido .. I wish they never touched the "driver reverse stuff" - if you plug in drivers the wrong way, you probably should not handle those electronics anyway. One thing I ´m also not sure about: The TMC2130 for example need´s to have VMOT enabled before logic VCC.. how would this work if VMOT is switched on later by the MCU? Maybe it´s not a problem, but could be another trap.

However I´m very happy with my other SKR´s and I hope BTT will handle this issue seriously.. better trashing this and doing an SKR 2.1 before the "mosfet fix" is not 100% working.

With the TMC2130 (and indeed other TMC drivers) it is in fact the exact opposite sequence that is required by the driver. The logic needs to be enabled before Vmot is applied. I quite from the datasheet:

"When cutting VCC from 5VOUT, make sure that the VCC supply comes up before or synchronously with the 5VOUT supply to ensure a correct power up reset of the internal logic."

Since all of the ground paths are sent via Mgnd it would mean that that a synchronous turn on will take place when Q1 turns on. In the case where the protection circuit kicks in because the driver is the wrong way around you may then find that some of the IO pins may be low which would provide a path to ground for leakage current through the driver via whatever pin the 3V3 rail is powering (probably M1) and the low IOs. This may put the driver into a state that is undefined due to the RAM and other internal circuitry not being correctly powered but it is not likely to damage it.

b-desconocido commented 3 years ago

@looxonline I'm glad someone experienced is here. So, pin selection was smart, there is nothing wrong with schematic and replacing a MOSFET is a perfect solution? Because 3 mA limit is not per pi PC13n, it is for entire RTC domain. It might be enough to drive a CMOS logic, but the output level might be well below Vth, which is 3V for both for both, wrong and correct mosfets. I though a bunch of bjt or 5v buffer are realibly solving any issues. Alternatively, using strong pull up and open drain mode for this pin. Could you please measure voltage at the gate?

Even though 3mA is shared between those pins the other pins are not acting as a drain. One is an input and will not need to source or sink anything meaningful due to the large input resistance and the other is driving an input which means that it too will not need to source or sink anything meaningful. This stands even during the brief periods where the ESP is being reset where it will be sinking 3V3 to GND via a 12K which will only result in another 0.275mA being added to the budget.

On that basis the pin would be able to drive the gate up to the full 3V3 without any problems and as such the FET would have a hard turn on capable of handling all of the current that needs to flow through it without developing an Vds that would breach the -0.5V threshold of the drivers IO pins. This was really the issue with the other FET. It simply breached that threshold.

I'll pop the scope on a little later and post the traces. Got a busy evening until late so I may actually have to do it tomorrow evening.

I'll ask again, would this even happen if the Q1's gate was driven by 5V logic, like other mosfets on this board? What would be the safest fix, replace a mosfet or bridge s and d of Q1? I'd bet my kidney the Q1 gate voltage is like 3.0-3.2V max, barely above Vth. up: I checked the schematic, you're right - it will receive 3.3V. I thought Q1 is pulled down with 1k for some reason

EsserPrototyping commented 3 years ago

@looxonline Now I´m really confused.. the FAQ from Watterott (https://learn.watterott.com/silentstepstick/faq/).. says:

"At power-up the motor supply voltage VM should come up first and then the logic supply voltage VIO. On power-down the logic supply voltage VIO should turned off at first and then the motor supply voltage VM, because the internal logic of the TMCxxxx driver is powered from VM. To ensure the correct powering a schottky diode from VIO (anode) to VM (cathode) can be added. The v2 Protectors for SilentStepSticks include this schottky diode."

looxonline commented 3 years ago

@looxonline Now I´m really confused.. the FAQ from Watterott (https://learn.watterott.com/silentstepstick/faq/).. says:

"At power-up the motor supply voltage VM should come up first and then the logic supply voltage VIO. On power-down the logic supply voltage VIO should turned off at first and then the motor supply voltage VM, because the internal logic of the TMCxxxx driver is powered from VM. To ensure the correct powering a schottky diode from VIO (anode) to VM (cathode) can be added. The v2 Protectors for SilentStepSticks include this schottky diode."

Just re-read the datasheet and that is actually correct. There are a number of different supplies on the TMC driver (VS, VCC, VCC5V, VIO). I had VCC mixed up with VIO in my earlier comment. The correct statement from the datasheet is

"A third variant uses the VCC_IO supply to ensure power-on reset. This is possible, if VCC_IO comes up synchronously with or delayed to VCC. Use a linear regulator to generate a 3.3V VCC_IO from the external 5V VCC source. This 3.3V regulator will cause a certain voltage drop. A voltage drop in the regulator of 0.9V or more (e.g. LD1117-3.3) ensures that the 5V supply already has exceeded the lower limit of about 3.0V once the reset conditions ends. The reset condition ends earliest, when VCC_IO exceeds the undervoltage limit of minimum 2.1V."

So VIO actually acts as the logic reset switch. On this basis VIO should be powered up post VCC and VS. However, since the driver never has a path to ground until Q1 is driven on it follows that VS, VCC and VIO all effectively start up in a synchronous manner. According to the statement above, a synchronous or delayed startup of VIO should do the trick so I don't imagine that there would be any issues with the power sequencing on the 2130s.

EsserPrototyping commented 3 years ago

Thanks for clarification @looxonline .. I totally overlooked that MGND is the GND for the whole driver. Now this makes sense to me. Following your explanation, I think replacing the mosfet is a viable solution for me.

looxonline commented 3 years ago

I just wanted to follow up on this with a bit of extra info that may help to explain things. The VBB rail is the rail that is used to power anything that is running off the primary voltage rail. This means that anything that is drawing a meaningful amount of current is drawing it from VBB. To reach peaks of 15A off the 12/24V rail is very possible, especially in the case where you are running a 12V printer.

So what impact does this have on the drivers?

Let's have a look at the datasheet of each FET. First a glance at the Vds as a function of Id curve for the part that was used as a replacement:

Screenshot 2021-05-04 at 07 03 29

Notice how soft the curve is. In particular, notice that if we approach 20A of current at around the 3.3Vgs that we would be applying then we have breached the 0.5Vds point. This means that we would be below -0.5V on any of the IO pins that are fed to the driver from the MCU which feeds off PWRGND. This would likely blow the UART pins which is what we see happening.

Now let's have a look at the curve for the original part:

Screenshot 2021-05-04 at 07 03 36

Notice how aggressive the curve is. In fact we can push double the current at the same Vgs for even less of a Vds. Certainly a far more efficient FET. With this in mind we can clearly see that the replacement FET was not spec for spec compatible and the BTT engineers would have done well to pay it more scrutiny. Nevertheless, something tells me that it's the kind of lesson you only need to learn once... Maybe it's also the fact that I had to head up a $2m product recall because a "drop in" component was used that also did not undergo sufficient testing ;)

b-desconocido commented 3 years ago

Mine Q1 measures exactly 3.2V on its gate. There are no drop in replacements available in local shops. Anyway, I don't know how to remove this little piece of silicone without a heat gun :(

According to seller, I don't need to send the board back to China to get a full refund for the board. The new one will be available at ~20 May.

looxonline commented 3 years ago

Mine Q1 measures exactly 3.2V on its gate. There are no drop in replacements available in local shops. Anyway, I don't know how to remove this little piece of silicone without a heat gun :(

According to seller, I don't need to send the board back to China to get a full refund for the board. The new one will be available at ~20 May.

Yea you won't need to send it back. They are being really reasonable about it. I've been so busy that I have not had a chance to scope out the turn on sequence. Hopefully will have a chance tonight. Regarding drop in replacements I was doing research yesterday and basically just sorted by qty available on digikey and then found the most available part that is drop in. It's this guy here: https://www.digikey.com/en/products/detail/fairchild-semiconductor/FDMS8025S/13515244

Getting it off the board will be a mission with an iron but not impossible. Just make two massive blobs on either side and keep alternating the iron between them with a pair of tweezers tugging on the top of the part until it releases.

b-desconocido commented 3 years ago

@looxonline for some countries, like the one I'm live in, the shipping fee will be $30+ (both mouser & digikey). It is comparable to a new SKR-2 board :) I found SIDR402DP-T1-GE3 in stock locally, is it a drop in replacement though? UPD: just after I added it to the shopping cart, it became "unavailable", great

looxonline commented 3 years ago

@looxonline for some countries, like the one I'm live in, the shipping fee will be $30+ (both mouser & digikey). It is comparable to a new SKR-2 board :) I found SIDR402DP-T1-GE3 in stock locally, is it a drop in replacement though? UPD: just after I added it to the shopping cart, it became "unavailable", great

Pity because that FET is a little beast. It would easily handle the job and then some.