MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.29k stars 19.24k forks source link

[BUG] SKR Mini stop mid print #18117

Closed adams79 closed 4 years ago

adams79 commented 4 years ago

Bug Description

After replacing my mp select mini board with an e3 mini skr board sometimes during a print the board completely freeze with heater on and fan on. Tried both via SD and octoprint. Replaced the board, same problem. Additionally when I run my Z-axis motor in StealthChop I've a sort of "constant" step skipping. Tried raising current but no changes. How can I track the source of problem? I'm going to change PSU as well, but I'm running out of ideas. The driver monitoring is disabled. After replacing the board I was able to print for about 5/6 hours without problems but then the problem resurfaced. Installed latest build available

Thank you

My Configurations

Required: Please include a ZIP file containing your Configuration.h and Configuration_adv.h files.

Steps to Reproduce

Start a printing from sd tr octoprint

Expected behavior:

The print should complete

Actual behavior: [What actually happens]

The print stops randomly Archivio.zip

swilkens commented 4 years ago

You have:

define DISABLE_X true

define DISABLE_Y true

define DISABLE_Z true

This disables the stepper motors when it's not in active use, this can cause some accuracy issues (step skipping?).

You are also not using the most recent bugfix, try the most recent - including the latest configuration files. Then do a M502 followed my a M500 after flashing.

I also notice you are using this board for a Malyan M200 - I assume you found these motor currents for the Malyan M200 in the original firmware?

adams79 commented 4 years ago

Hi swilkens, ok I'm going to test with latest bugfix and config. yes the currents are from the original Maylan board. Also I will try to not disable the motors, however the step skips on Z-axis is not related to this. Once I activate stealth chop on this axis seems that there is a "fixed" amount of skips since object are always lower of about 20%. The weird thing is that by moving the z-axis with direct command the movement is right, but during print seems to skip about 20% of steps... I'm thinking of switching the z-axis motor

adams79 commented 4 years ago

Ok, tried with latest build and config, stops after about 30 minute... trying to enable SD logging to check if some error is logged in the console...

kain0m commented 4 years ago

Check this one out: https://github.com/MarlinFirmware/Marlin/issues/15337 It is related to SKR 1.3 (not mini), but the issue is the same. Random freeze of the print, heaters on.

My solution was a different SD card. Also, try reformatting the SD card(s) with FAT32. I have also read that the supplied Bigtree tech Micro SD card is poor quality in some cases.

adams79 commented 4 years ago

This seems to not be my problem since it happens also by printing from octoprint without sd inserted

swilkens commented 4 years ago

Since the Malyan M200 by default does not have a fan in the electronics case, could it be that the drivers are overheating? In that case I would expect one driver to stop, rather than all of them.

Try to enable TMC driver monitoring and see if the driver reports an overtemp when it stalls. Do you still have serial communication when this happens or does the board fully lock up?

Which environment are you using the build the firmware? 256K? USB composite?

cooldudie2 commented 4 years ago

i can confirm the exact same errors, as well as a lot of MP SELECT mini users using the Skr mini e3 1.2 mod i uploaded to thingiverse. (cooldudie2) all the users i have been in contact with have experienced the same freeze up issue using the firmware i have written, as well as their own compiled firmware based on bugfix 2.0.x branch and newest marlin 2.0.5.3 branch.

After replacing my mp select mini board with an e3 mini skr board sometimes during a print the board completely freeze with heater on and fan on.

Tried both via SD and octoprint. Replaced the board, same problem i have tried different suppliers and same problem occurs.

someitmes it freezes sometimes it finishes prints just fine.

note the jumper cap is removed from the board as its set in firmware and not required when using uart mode some people have not taken the cap off by default but confirm the problem still persists.

i have tried a powerful and more reliable 12v 750watts ATX monitored PSU and noticed no change in voltage drop when print fails, power fluctuation of max 0.2 volts either side of 12volts when in use.

*Driver monitoring doesn't report any overheat prewarn triggers.

*linear advance and hybrid theshold are turned off to rule those out completely.

*i have also tried setting the extruder to spreadcycle mode to rule out the problem where stealthchop triggers a lock up when moving to fast in one direction and then the other as suggested by others on the forum.

*256k version and 512k version show the same fault regardless of environment used.

*board fully locks up and doesnt respond to usb commands via pronterface or from the malyan m200 screen attached.

i will report back if the problem re occurs after some more prints.

Steps to Reproduce Start a printing from sd tr octoprint

Expected behavior:

The print should complete

Actual behavior: [What actually happens]

The print stops randomly

cooldudie2 commented 4 years ago

i have also just considered that the screen is drawing 3.3v logic power from the swd header on the board for power, i wondering if this could be causing the issue of lockup feature. whats the swd header for and if you draw power from it could it effect the other electronics

adams79 commented 4 years ago

I’ve currently swapped the z-axis motor with a Nema 17 however I think that this will not solve the problem. Also I have a spare skr 1.4 turbo (with tmc2209) that I bought for another printer, I can make a try with this board too

adams79 commented 4 years ago

Since the Malyan M200 by default does not have a fan in the electronics case, could it be that the drivers are overheating? In that case I would expect one driver to stop, rather than all of them.

Try to enable TMC driver monitoring and see if the driver reports an overtemp when it stalls. Do you still have serial communication when this happens or does the board fully lock up?

Which environment are you using the build the firmware? 256K? USB composite?

I have a v2 that has a case fan (that I’ve also replaced with a noctua)... tried to print without side panel and the heat sink on drivers are cool

minosg commented 4 years ago

From a similar thread investigating the same issue on another project.

https://github.com/KevinOConnor/klipper/issues/196

This correspondence with TMC is being attached which explains the lock-up issue

I think, the reason for disabling of the TMC2208 driver could be

a hard stop of the motor in stealthChop (Step frequency goes from a higher value, e.g. > 0.5 RPS to 0)

an abrupt change of motor velocity (Step frequency goes from a higher value to a low value within a single step).

When using stealthChop, please always make sure, that you use velocity ramping. A hard stop will cut away motor back-EMF at once. As stealthChop is a voltage based chopper, it cannot respond to this at once, like spreadCycle. The result is an overcurrent, and the motor driver goes to overcurrent switch off, until it becomes disabled / enabled again.

To resolve the problem, please use at least a tiny velocity ramping, when hard stopping the motor, e.g. within a few / a few ten microsteps.

To my understanding this issue could be addressed by:

Does marlin have a velocity ramping configuration parameter?

adams79 commented 4 years ago

Confirmed that the problem still persists after swapping the Z-axis motor with a standard Nema 17. Since other SKR mini users are not experiencing this it should be related to same specific component of this printer (this is the reason for which I've swapped the motor). I'm thinking to make a test with the display completely disconnected (I use Octopi) since @cooldudie2 is not sure that is ok to get current from SWD. Regarding @minosg reported issue in my understanding if this happens for a single driver it should be stopped only this and not the whole board.. is this correct?

adams79 commented 4 years ago

@minosg additionally note that at least in my case steppers are on (you can check this by trying to move the head or the bed with hand) so drivers should not be stopped. Launched a print without display will report results

adams79 commented 4 years ago

cannot be yet sure, but have printed for about 6 hours with display disconnected and no issue so far. will keep it disconnected, if someone can make the same try it will be useful to have multiple test cases... (you will need to print from pc or octoprint)

minosg commented 4 years ago

@adams79 The Marlin lockup of the steppers is configured and you can set it in the conf files( see DEFAULT_STEPPER_DEACTIVE_TIME and DISABLE_X/Y)

The TMC driver lockup is a totally different issue, which is triggering a failsafe component on the motor driver IC when it monitors a current spike outside of the expected thresholds. As an oversimplified explanation, when you invert the direction the momentum of the motor will generate an inverse current which depending on phase can add to the driving one and push it over the threshold. When that happens the driver chip will lock up and you need to power cycle it to unlock it. TMC engineers in the thread I posted, recommend either:

Following up my suggestion, I did some research and experiments.

In your conf_adv you have set up the motor currents as

  #if AXIS_DRIVER_TYPE_X(TMC26X)
    #define X_MAX_CURRENT     1000  // (mA)
    #define X_SENSE_RESISTOR    91  // (mOhms)
    #define X_MICROSTEPS        16  // Number of microsteps
  #endif

This could be an issue for a couple of reasons

Mind that most Enders runs on 2A rated motors and at 24V which make the current calculations different. This would explain why most mini users see that issue but Ender users do not.

I have set the X_MAX_CURRENT to a safe threshold of 500mA and the MINIMUM_STEPPER_POST_DIR_DELAY/MINIMUM_STEPPER_PRE_DIR_DELAY to a large arbitrary value of 1500 and it has yet to lock-up yet. I would recommend you do the same and try to see if that addresses the issue

adams79 commented 4 years ago

hi @minosg you should check config for tmc2209 and current is set to 450mah (380 for the Xaxis motor). Regarding the MINIMUM_STEPPER_POST_DIR_DELAY this appears only related to the the driver itself so this problem should happens for all users of this board... However please let us know is you have no other freeze. Unfortunately I've to confirm that with display connected and powered with a sperate line I still experienced a freeze. The only case with no freeze for almost 10 ours has been with display completely disconnected...

minosg commented 4 years ago

@adams79 apologies I was looking at the wrong section of the attached config.

As for the the problem appearing to every user, that should not be the case. BTT SKR e3 board has the r26 and r27 resistors set to .11Ω which according to chapter 8 of the TMC2209 datasheet is for sensing a motor of 1.7A and higher. While this setting is usefull for sensing the end point break, it is indicative that Select Mini is using a considerably lower power motor than the board was tested on. If that is operating at its threshold a lot of bizarre things can happen.

My testing is showings that this delay does matter, and gets to make the issue less frequent. You can also isolate the extruder motor as the suspect, because if you do not use filament and run it as a dry run it should never freeze.

The other good question is why is the watchdog not kicking in and resetting the board?

adams79 commented 4 years ago

I’m just trying to swap the board with an skr 1.4 turbo that I got for another printer, it has 4 tmc 2209... I’m curious to check if the same problem will arise (however it will require some work to mount the board in the little case)...

Il giorno 5 giu 2020, alle ore 19:00, minosg notifications@github.com ha scritto:



@adams79 https://github.com/adams79 apologies I was looking at the wrong section of the attached config.

As for the the problem appearing to every user, that should not be the case. BTT SKR e3 board has the r26 and r27 resistors set to .11Ω which according to chapter 8 of the TMC2209 datasheet is for sensing a motor of 1.7A and higher. While this setting is usefull for sensing the end point break, it is indicative that Select Mini is using a considerably lower power motor than the board was tested on. If that is operating at its threshold a lot of bizarre things can happen.

My testing is showings that this delay does matter, and gets to make the issue less frequent. You can also isolate the extruder motor as the suspect, because if you do not use filament and run it as a dry run it should never freeze.

The other good question is why is the watchdog not kicking in and resetting the board?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MarlinFirmware/Marlin/issues/18117#issuecomment-639632399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOCVK4G7RH6CR5KYIZRBDLRVEQD5ANCNFSM4NKBOL3A .

minosg commented 4 years ago

To answer my own question and to offer a workaround on the freezing part, the reason is that the watchdg is not enabled on this cpu HAL

https://github.com/MarlinFirmware/Marlin/issues/18226

If you uncomment this line the board will now reset when the crash occurs which is more safe than crashing with the heaters on.

As to why that stop happens, I am still investigating it since it has been reported for months, and there is no apparent solution.

What I have confirmed so far.

When it happens the board is lost. You have lost UART debugging, sd logging etc

This means that most likely we are seeing an interrupt race condition, or disabled interrupts. But on the same time the heaters are running in set temperature, so the pwm loop for the heater is working, which makes it less likely to be a hardfault.

adams79 commented 4 years ago

HI @minosg, thank you for the update. In the meantime I've managed to mount the skr 1.4 turbo board and start printing, until now (about 4 hours) I've had no freeze. I was not able to connect the Maylan lcd onto this board (tried to connect onto the tft serial but unfortunately on this serial I see the standard console output and not the encoded one that should come from the maylan ext ui... can you help?)

adams79 commented 4 years ago

@minosg I can confirm that is related to the extruder, I see that printing at high speed with high layer height and fast retract the freeze occurs very often.

minosg commented 4 years ago

HI @minosg, thank you for the update. In the meantime I've managed to mount the skr 1.4 turbo board and start printing, until now (about 4 hours) I've had no freeze. I was not able to connect the Maylan lcd onto this board (tried to connect onto the tft serial but unfortunately on this serial I see the standard console output and not the encoded one that should come from the maylan ext ui... can you help?)

That is a separate issue but I would be looking at the serial declaration. The Malyanlcd serial is set in extui_malyan_lcd.cpp

You need to make sure the Serial is available and properly set in the appropriate pins, ie for BTT Skr Mini you can find it on pins_BTT_SKR_MINI_E3.h

Last but not least you need to make sure the host serial is properly selected in config.h

minosg commented 4 years ago

Also does the SKR turbo have hardware or software serial? I have come to notice that SKR mini 1.2 is using software serial to talk to the drivers, and I am not quite confident that the pulse timings are withing acceptable threshold.

But could a broken serial channel on the TMC driver commns break the firmware? I cannot see how.

adams79 commented 4 years ago

@minosg after more than 12 hours printing on the skr 1.4 board I experienced no freeze with the mini. Note that I use the same drivers (TMC 2209) on UART, so seems to me that the problem is not directly related to the drivers or the motors... However I've not connected the display. Have you tried using the printer with the display detached? In my case I was able to print without freeze on the mini this way...

taragor commented 4 years ago

I've just run into the issue exept the Z skipping. I'm using the SKR mini V2.0 (STM32F1, TMC2209) on an ender 3. However I have printed without any issues for probably 30 hours so far on 50mm/s. Yet the printer freezes randomly with heaters on as soon as i try printing at 80mm/s (Had one freeze after 1h and one after 5h, same gcode). Anyway I'm not sure that the issue comes from sudden direction changes since both prints failed at curves/circles in walls (cura wall speed: 40mm/s)

minosg commented 4 years ago

@taragor according to my experiments, it is more related to the acceleration than the maximum printing speed. You can trigger it quite consistently if you use any high retraction model ( ie Flexi Articulated Gecko Keychain ) and set:

As long as you have filament in the printer it will get frozen in the next 5 prints even when printing at 40 or 35mm/s speed.

Other things we have confirmed so far is that it still happens when you:

Basically every workaround on tickets relating to the same issue proposed in the last few months has been tested, and even though it makes it better, it will not fix this issue.

The question now is weather the issue we are facing is:

I am pretty much at a loss.

sawaguna commented 4 years ago

I have the same problem with my SKR 1.3 It's really random. It can works for days without issues, and then it can freeze during a print, with heaters on.

No idea where the problem is. I had it happening with both SD and USB (Repetier-Server)

First time I had this problem, it was a file with 30% gyroid infill. Each time I was attempting to print it, it would freeze the printer (being via SD or USB). I just changed the infill type to rectangular, and it worked.

So I thought it was maybe a cache issue. But I really don't know, and it's probably not related

taragor commented 4 years ago

@minosg Wait, do you mean it will litteraly crash for the next 5 print, regardless of the file?

As long as you have filament in the printer it will get frozen in the next 5 prints even when printing at 40 or 35mm/s speed.

If so I think I witnessed that. After my first failed print I had some issues restarting it: I was printing live from octoprint and octoprint lost connection during G29: It heats the bed and then does G28 followed by G29 (as specified in my start Gcode), however it stops sending temperature status and Octoprint will just time out. G29 however completes normally I didn't think much of that since I occasionally had that issue before so I'm not even sure if it's connected to this. I had this a few times before but this time it happened a few times in a row. I can't recall how often exactly but since i tried powercycling both the printer and the raspberry I guess it was 3-5 times.

minosg commented 4 years ago

@minosg Wait, do you mean it will litteraly crash for the next 5 print, regardless of the file?

As long as you have filament in the printer it will get frozen in the next 5 prints even when printing at 40 or 35mm/s speed.

I believe you need a high retraction model. Articulated models are perfect for that. The reason that his issue has been creeping for over a year is that is quite random and hard to reproduce.

What I meant above is that if you use that model, and those settings in prusaslicer it will make it more apparent.

My gecko gcode, will crash the firmware in maximum 5 prints, regardless of the setting combinations I described above.

But if you run it without filament in the extruder, it will never crash.

taragor commented 4 years ago

Do you know if it is possible to debug Marlin using JTAG? If so it might give a clue whether it's the m3 just halting or marlin waiting for some kind of event/data.

minosg commented 4 years ago

Yes you can. BTT SKR mini is using an ST chip, which is supporting SWD debugging. The cheapest way to go around it is to buy an ST-Link which is supported by platformio.

It becomes a bit more complicated for select mini users since they use that header to power the 3.3 display, which will need to be externally powered during the debug session.

The other issue is that there is no DebugMonitor for the STM32F1 implemented, like the LPC1768 so you need to implement something similar to use the unwinder and unwmemaccess

So it is possible, but being an open source project, people have time constrains.

What we need to determine if that is a proccessor/HAL family related problem or weather it affects all Marlin, and is just hidden by the fact that other platforms have a working watchdog ( So the board is resetting instead of being stuck with heaters on )

stavinsky commented 4 years ago

I have two issues. first: it starts to halt during print. Like I see temp changes can send commands via host but no one stepper rotating. second: it starts to failing boot. Like sometimes I see virtual serial port but it hangs up after connecting, sometimes i dont see any usb device. Sometimes I can fix it only by reflash firmware by the same file. I use 265K version of the board. My board is skr mini 1.1. There was no problem like that for a month from buy. I have no display connected. All the drivers is 2130 from fysetc. And. Without inserted sd card it starting to boot much better.

UPDATE: OK Looks like I found my problem. And looks like it is hardware. I'm not sure but I found out that my coolers that connected to 12v are disabled. So my power supply are good I can measure 12v on input but all 12 bus has about 1.5v. I checked also what happend if I just send M112 and coolers works internal bus has 12v as expected. So looks like something shorted during print. I will try to repeat and find what component are most heat and check 12v bus again

UPDATE 2: So in my case positive 12v pin was burned out. That is why sometimes I saw 1.5v but not 12) Very very good quality of BigTreeTech boards. ) I soldered it again. Will see what changed.

taragor commented 4 years ago

@minosg I've just printed te Gecko keychain with no issues. Prior to that I've printed https://www.thingiverse.com/thing:4427599 (also has quite a lot of retractions) at the same settings. Had no issues either. My settings were 50mm/s with the rest being the default cura settings for ender 3. I'll try again with fasters speeds tomorrow.

minosg commented 4 years ago

@taragor I think it is needed to override the max settings. Most slicers will play it safe with the default profiles which is why I'm using prusaslicer which allows you to set the machine maximum settings at the beggining of a job.

To maximise extruder movements also enable retraction per layer, zhop and linear advance

taragor commented 4 years ago

@minsog Strange. My failed prints (2 times the same Gcode, one failed after ~1h, the other after ~5h) were this: https://www.thingiverse.com/thing:3757724 This model has relatively few retractions. I just set print speed to 80mm/s, my other settings were just Cura defaults

taragor commented 4 years ago

I've had another two failed prints. Both of them failed while printing curved perimeters. Also I tried some more geckos with different settings and they all worked. Since all my fails were on curves I'll try to disable S-Curve acceleration or use junction deviation and see if that makes any difference.

adams79 commented 4 years ago

After many days using the SKR 1.4 turbo instead (with the same kind of TMC 2209 drivers) I can assure that it never freeze. Note that I'm using mostly the same configuration. I'm still not able to connect the display so I finally swapped it with a TFT24. As I mentioned below all my tries with the LCD disconnected completed with e3 mini also, so I suggest to make some try in this way (you will need an octoprint or a pc to send the print to the board)

minosg commented 4 years ago

Just an update on this issue, having been testing for it in the last week.

When it happens, the Bed and Hotend PWM is working as intended, which means both the interrupts, and the PWM are functioning. Also the DIAG pin on each of the drivers appears to be low, and the Index pin pulsing as expected.

Considering that the issue becomes more visible with setting high acceleration settings in M204 and under the assumption that it could not be the consumer's fault ( tcm_stepper) I started looking into the producer (planner.cpp).

If you disable the Malyan_LCD and print using SD card commands (M21 & M24) I have yet to have a freeze in two consecutive days of printing the test file. This could also be the case for #18315

BTT have pushed a change like that on their display driver which could indicate a possible workaround https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/commit/5777f41b5f8c1a41410a6874614499c91ac78fa2

Could the people affected please share:

adams79 commented 4 years ago

@minosg In my case I was using the Malyan display when experienced the issue.. Unfortunately I've now swapped both board and display so I don't know If using the e3 mini with the the new display would solve the problem. It's possible however that when using high retraction/acceleration would cause to send multiple updates to the display and this cause issues (race condition on the serial port?)

minosg commented 4 years ago

You don't have the original board to try?

adams79 commented 4 years ago

No unfortunately I've returned it.

taragor commented 4 years ago

@minosg I've tested with the display disabled in Marlin, having the TFT35 in BTT mode, therefore just running as an serial GCODE terminal, like octoprint or pronterface. I've just had another freeze with the following setup: -TFT35 connected using the TFT connector (Serial 2 in Marlin) -Raspberry connected via USB (Serial -1) and was printing live of Octoprint -Display disabled in Marlin, ribbon cable to EXP3 on the screen disconnected -Object was sliced in cura, speed 80mm/s, despite that pretty much default settings.

minosg commented 4 years ago

@taragor pleaee disable displays on firmware, not just disconnect them and print using sd commands. You can do that with pronterface. The issue i am seeing are serial interrupts firing into each other. Printing though octoprint or even enabling host keep alive will trigger it.

taragor commented 4 years ago

@minosg I have disabled #define CR10_STOCKDISPLAY and some other things according to sanity check (i.e. M73). I'm currently printing with octoprint disconnected, and only the TFT35 in BTT mode (for my understanding that should be the same as using pronterface, please correct me if I'm wrong here). I'm currently printing live of the TFT, using a USB drive in the TFTs USB Slot, but will try printing of the printers onboard SD once that finishes or fails.

EDIT: just to clarify, you suggest starting the print through pronterface (or octoprints terminal for that matter) using M24/27 and then disconnecting from the printer?

minosg commented 4 years ago

I think that printing from sd is m21, m24 commands but pronterface has an sd button to automate it. And yes that will make sure the only interrupts firing are the tsteppet ones. Btt mode is still a serial

boelle commented 4 years ago

Anyone that has the board mentioned:

Please test the bugfix-2.0.x branch to see where it stands.

minosg commented 4 years ago

@boelle I have the BTT SKR mini e3 which is the one mentioned. I also have a cheetah board, using the same STM32F1 chipset, and I have ordered another one. All of them are using this chipset and TCM uart's in serial mode and exhibit the same issue

I can confirm that the issue is still there on latest bugfix and the latest TMC_Stepper library ( v0.7 )

adams79 commented 4 years ago

I confirm that I'm seeing that on SKR E3 Mini with latest bug fix but not on SKR 1.4 (LPC chipset)

minosg commented 4 years ago

@adams79 are you using malyan_lcd on SKR1.4? What mode are you using the steppers on?

adams79 commented 4 years ago

@minosg no I'm not able to use the Malyan LCD, I'm using a TFT24 connected BOTH in serial and LCD12865 emulation mode. I'm using all steppers in stealthchop, the drivers are tmc2209. Print dozens of hours with no problem with this setup