MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.06k stars 19.16k forks source link

[BUG] Thermal Runaway - But Marlin does not catch it #20749

Closed zenturacp closed 2 years ago

zenturacp commented 3 years ago

Bug Description

Suddenly while printing the temperature reporting on both bed and hotend stops updating, it just keeps on the last recorded value and if the value is < target it will keep the hotend/bed on if the last value is > target it will shutdown the heater on hotend / bed. This results in the hotend / bed is either turned off or permanently on

Thermal Runaway does not detect the issue because it keeps reporting the same value over and over again.

After reset it cant start because it says hotend is above threshhold which it really was, target was 210 and when i finally got it on it said 260/120, target / status pre reboot was 210/60 - and just kept beeing that even if i set it to cooldown it does not shutdown the heat.

I'm running this version now https://github.com/MarlinFirmware/Marlin/tree/cf1f8aff7781c221d76c671e94a88d6d851b2d4d

Im not aware of any recent changes to the printer, the firmware that was on the board was from mid december, and i just updated it yesterday to the version i referenced.

It will print and its not every time it happens. It could be a hardware issue, but I really dont know why it just works after reboot again.

It have happend 3 times now on the same model but sliced with different parameters.

Model: https://www.thingiverse.com/thing:2482299

CatsandwichBowl_V2.zip Sliced GCode

Configuration Files

Configuration.zip

Steps to Reproduce

I have a modle i slice it with default settings in Cura for Ender 3 Pro Upload GCode to Octoprint Print model - after X hours of print I see some deformation on the model. When i stop the print the actual temperature does not drop - and reporting to console is the same again and again Recv: T:209.84 /0.00 B:59.84 /0.00 @:0 B@:0 Recv: T:209.84 /0.00 B:59.84 /0.00 @:0 B@:0 Recv: T:209.84 /0.00 B:59.84 /0.00 @:0 B@:0 Recv: T:209.84 /0.00 B:59.84 /0.00 @:0 B@:0

After reboot / reset the temperature is actually Recv: T:237.50 /0.00 B:112.57 /0.00 @:0 B@:0

Expected behavior:

That the temperature keeps updating, and is the correct (Actual values) that is visible to marlin

Actual behavior:

After some hours printing the printer have "static" readings from both bed and nozzle, it seems like the loop is not updating the actual values - but its for both bed and hotend at once.

Additional Information

Motherboard is SKR 1.4 Turbo Display BTT 3.5 V2 Printer Ender 3 V2 Hotend Microswiss All metal hotend

IMG_3846 What i see on the model when it happens

Pre Status before Reset

Post Status after reset (Several times - because its just running an alarm stating that the hotend is above threshold

ConsoleLog.zip Here is the terminal output, where you can see that the motherboard sends temp updates but its just the same values over and over again.

loetefix commented 3 years ago

Exactly the same problem here with Board: SKR1.4Turbo TFT: TFT35E3V3 Hotend:Hotend Microswiss All metal hotend Printer : Tronx xy2-Pro

Yesterday the current temperature freeze: Bed - 81 ° and Hotend 234 ° The Bed was Cold and the Hotend was Heating too much. Bed temperature was set to 80 and Hotend to 240. The Printer was still Printing( The Printer was not freezed ) only the temperature measurement was freezed.

ellensp commented 3 years ago

what version firmware is on the TFT35E3V3 ?

loetefix commented 3 years ago

Hi. Have to look when i am at home. Its still the original firmware that was on the tft when i bought it. Could it be a tft firmware problem?

Thanks

⁣BlueMail for Android herunterladen ​

Am 22. Jan. 2021, 06:50, um 06:50, ellensp notifications@github.com schrieb:

what version firmware is on the TFT35E3V3 ?

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/MarlinFirmware/Marlin/issues/20749#issuecomment-765144832

loetefix commented 3 years ago

Hi, It is V3.0.26 APR 11 2020

loetefix commented 3 years ago

Today i updated the tft to latest Firmware and configure some changes in marlin asdescribed in the config.ini of the TFT:

General options: M115_GEOMETRY_REPORT (in Configuration_adv.h) M114_DETAIL (in Configuration_adv.h) REPORT_FAN_CHANGE (in Configuration_adv.h) EMERGENCY_PARSER (in Configuration_adv.h) SERIAL_FLOAT_PRECISION 4 (in Configuration_adv.h) HOST_ACTION_COMMANDS (in Configuration_adv.h)

Options to support printing from onboard SD: SDSUPPORT (in Configuration.h) AUTO_REPORT_TEMPERATURES (in Configuration_adv.h) AUTO_REPORT_SD_STATUS (in Configuration_adv.h) LONG_FILENAME_HOST_SUPPORT (in Configuration_adv.h) SDCARD_CONNECTION ONBOARD (in Configuration_adv.h)

some configurations i did not before the update.

It seems that it works pretty well til now.

2 Prints without any problems.

zenturacp commented 3 years ago

what version firmware is on the TFT35E3V3 ?

I run the original from April -

But I really never use the display or only use it in marlin mode - checked the OctoPrint / Terminal and the temperature updates was constant - but it still updated on the console though..

MakerMeik commented 3 years ago

I have exactly the same problem here. SKR 1.4 Turbo with Marlin 2.0.7.2. Unfortunately, it happens to me only at longer intervals that the temperature indicator on the display freezes and the 3D printer heats up the Nozzle and the bed more and more to the point of smoke. Has anyone already found the reason for this? Has the TFT firmware fixed the problem? I run the printer almost exclusively via Octoprint, which is an Octopi on a Raspi connected to the board via USB. My guess is that there may be a problem in this regard.

thinkyhead commented 3 years ago

I have exactly the same problem here. SKR 1.4 Turbo with Marlin 2.0.7.2.

Please test the bugfix-2.0.x branch to see where it stands. If the problem has been resolved then we can close this issue. If the issue isn't resolved yet, then we should investigate further.

MakerMeik commented 3 years ago

OK, I have tried the bugfix branch. There still seems to be a fundamental problem. Even though I didn't run into the overheating problem now, after two or three more test runs I now had the case that the temperature was displayed with the desired 210°C after the usual warm-up phase, but the nozzle actually only had about 60°C after a while. This was only displayed correctly after a reset of the mainboard.

However, I believe that the temperature was reached initially for a short time, because the PLA started to flow. That the Nozzle temperature is obviously too low, I noticed only by the fact that the filament was suddenly no longer flowable. I now occasionally check the temperature with my multimeter thermistor when I have doubts.

Since it sometimes affects the bed and sometimes the nozzle, I would exclude hardware problems at the temperature sensors.

However, to be able to recreate the problem reasonably reliably, I sent the job via Octoprint again this time. I am not sure if the problem persists if I restrict myself exclusively to the internal SD card and cut the USB connection to the Raspi.

zenturacp commented 3 years ago

It's exactly what I see though it can be both - cooling down or heating up, depends if the last reading is > desired or < desired..

And it's the same for bed and nozzle..

It happens only on one model for me but it's every time.. Haven't seen it on other models so have to be some kind og software issue..

I tried printing the same model 5-6 times and it happend every time..

So I'm also on some kind of issue with something that stops the thermal part in the firmware to freeze..

The actual heat was +++++ what expected on both bed and nozzle.. But display said 210/60 and also OctoPrint and GCODE in terminal said 210/60.. After reset 50 times (thermal protection) it was at 285 and 140 on bed..

Thermal runaway does not stop the printer because the printet thinks it's okay

MakerMeik commented 3 years ago

I've had this problem on several models now. I switched to the SKR board only a few weeks ago and am therefore still running various optimization tests. The error occurred both with models that I had sliced myself in Cura (xyzCube) and with a model that I downloaded from a website (Link). I would therefore also exclude that it is due to any special slicer settings.

My suspicion goes as said in the direction of Octoprint or the serial interface. But I have no deeper understanding of how the Marlin code works, so this is amateurish guessing ;-)

Image00001 Image00002

zenturacp commented 3 years ago

It's in my opinion unlikely a octoprint issue since there is no way you can have marlin to not update the measurement and the thermal runaway is totally bypassed when it happens..

I have read other issues with SKR but not exactly this issue because marlin still reads correct temperature..

zeleps commented 3 years ago

It happened to me twice as well, in two random occasions, while using octoprint.

The second time, I was preheating the printer, I had the temperature graph open, and the temperature rise appeared to slow down abnormally as it apporached the target temp (200°C), slouching around 185°C, then suddenly it jumped to 260°C+. The printer halted, and after reconnection it cooled off gradually, so the reading was probably accurate.

I suspect it has something to do with M155 temperature reporting (a buffering issue maybe?), but haven't got the time yet to try and reproduce it or debug it. I'll get back to it when I get the change.

MakerMeik commented 3 years ago

In the meantime, I removed the USB cable from my Raspi and have since done quite a few prints via SD card. Some of them took over ten hours. The problem has not occurred since then. I will connect the Raspi again in the near future to see if the problem could be related to this. I can imagine that there is a connection with #21010.

MakerMeik commented 3 years ago

Screenshot 2021-04-17 153040 OK, I am now 99.9% sure: There is a relationship with Octoprint or the serial communication. I have now printed at least 10 models exclusively from the SD card. I had Octoprint respectively the USB cable to the Raspberry Pi completely disconnected. During this time there were no temperature problems at all.

Today I needed a terminal to send some GCODES. In this context I reconnected the Raspi and set the Nozzle to the 210°C via Octoprint. This also worked as expected for the first few minutes, until it came back to the heating problem, where the display continued to show my set 210°C. I recognize the smell instantly by now and was able to quickly reset the board. Because once the filament is fried in the nozzle, a major cleaning or replacement action is usually required in most cases. I have ruined a Nozzle in this way in the meantime.

After the reset, the actual temperature of just under 300°C was displayed again as before, which then slowly normalized.

So this time I didn't even have a print running, but just set the temperature and played around with the following GCODES:

M92 G91 G1 E100 F50 M92 E100 M500

However, I suspect that the GCODES did not play a role in this, because initially everything ran as expected. Only after two or three runs the described problem occurred.

zenturacp commented 3 years ago

I can confirm I also used OctoPrint.. But it's only on certain prints.. Have had long prints working after this..

But exactly same here 210 on display and 300+ on nozzle

MakerMeik commented 3 years ago

Today I had the problem for the first time even without Octopi was connected via USB. In the meantime surely 20 to 30 prints have run. I would stick to the idea that the USB serial connection accelerates the problem, but it seems to occur with SD card-only printing as well.

flat-jack commented 2 years ago

Have the same problem. Also using a bigtreetech tft. Never had problems with the temperature before. Even on 10 hour prints. Printer is is running Marlin 2.0.9.2. But suddenly they occur in random order. Also realized that my reset button isn't workin correctly anymore. Have anyone tried to change the tft and see if this fixed the problem? I am not using octoprint. Always print from sd card.

pillopaolo commented 2 years ago

I had exactly the same problem too! With BigTreeTech SKR 1.3 + TFT35 E3 V3, while priting from SD-card via the TFT. Temperature was frozen just below setpoint, with heater always ON, printer runing normally. Material (PP) started to bubble and make noise --> temperature was in fact > 350 °C, as indicated by Marlin after restart.

No matter who is "interfering" with Marlin (Octoprint, BTT TFT, etc): 1) Marlin should not stop updating the temperature! 2) To cope with temperature freeze (whether caused by Marlin or by the hardware), Marlin should have a kind of freeze check, i.e. if T is not moving (say +-0.1) for a while (say 30 secs), then KILL!

I never had problems with Marlin-2.0.7.2 downloaded Oct-2020. Problems started with Marlin-2.0.9.2 dowloaded Oct-2021 By the way, watch dog was properly set "#define USE_WATCHDOG"

This is a serious FIRE HAZARD, I encourage the most skilled developers to spend some more time on this. Thanks

robbycandra commented 2 years ago

@pillopaolo, have you updated your firmware to the latest bugfix? What is your firmware version?

pillopaolo commented 2 years ago

I did not update because the problem is very difficult to replicate, it happened only once to me. I prefer to wait and see somebody acknowledging the issue, troubleshooting it and finding a solution and documenting/commenting the code accordingly. To be 100% sure the issues is really solved. We are talking about a FIRE HAZARD here! I cannot proceed with trial'n'error.

I have some knowledge of coding, happy to contribute with troubleshooting if you tell me what lines of code are suspected.

BEEPER_PIN: most likely not configured. BUT a beeper does not solve the problem...

pillopaolo commented 2 years ago

BEEPER_PIN is NOT defined in my case. It is defined when "HAS_WIRED_LCD is defined", which in turn is defined when "IS_ULTRA_LCD is defined", which is not my case. I defined REPRAP_DISCOUNT_FULL_GRAPHIC_SMART_CONTROLLER, needed when BTT TFT35 in Marlin mode.

robbycandra commented 2 years ago

@pillopaolo The FIRE HAZARD problem can be caused by Hardware too.

When the printer board turn-off the heater, it cut the GND line The 0V lines. Now, let's look at the wires from our heater. Usually, near the heat block, there is a part that is slightly exposed, without insulation. If the GND wire in the heater is stuck to the Heatblock. Then the heater will always be on. Because the board disconnects the heater from GND. In some printers, I add a MOSFET to change the cut-off to +12/24V.

robbycandra commented 2 years ago

Because maybe in some printer, the printer body is connected to GND line. including the nozzle heat block. I think this is a serious problem for 3d printers. But until now, no one talk about it.

pillopaolo commented 2 years ago

@robbycandra: I will check later the HW as you suggested

However I see now that a more recent version of the code has been corrected as follows: inline void loud_kill(FSTR_P const lcd_msg, const heater_id_t heater_id) { marlin_state = MF_KILLED; thermalManager.disable_all_heaters();

While in the older version I have, the thermalManager.disable_all_heaters() command was WRONGLY put under "if USE_BEEPER" as follows: inline void loud_kill(PGM_P const lcd_msg, const heater_id_t heater_id) { marlin_state = MF_KILLED;

if USE_BEEPER

thermalManager.disable_all_heaters(); 

Therefore the heaters were NOT properly disabled if the beeper was not defined.

Correct?

pillopaolo commented 2 years ago

@pillopaolo The FIRE HAZARD problem can be caused by Hardware too.

When the printer board turn-off the heater, it cut the GND line The 0V lines. Now, let's look at the wires from our heater. Usually, near the heat block, there is a part that is slightly exposed, without insulation. If the GND wire in the heater is stuck to the Heatblock. Then the heater will always be on. Because the board disconnects the heater from GND. In some printers, I add a MOSFET to change the cut-off to +12/24V.

If I understand well, the above only explains why heaters stay ON, it does not explain why Marlin stops reading the temperature and keeps displaying the old one... Or I miss something?

robbycandra commented 2 years ago

Well.... Actually if BEEPER is not used. Disable_all_heaters is called at kill(). I think the Beeper is not the case.

Disable_all_heaters moved to the top because now marlin have park nozzle when printer goes to thermal halted, this only to ensure the disable_all_heater called first.

robbycandra commented 2 years ago

@pillopaolo , yea... I still don't understand about it.

zeleps commented 2 years ago

@pillopaolo what type of temp sensor do you use?

robbycandra commented 2 years ago

I forgot to say that the thermistor is connected to GND, If the heater GND is connected to thermistor GND, then it will stay heating.

pillopaolo commented 2 years ago

When the incident occurred, all (except the heater) was working well. This make me think that kill() was not called (assume Kill() stops motors too). GND is not the issues; 1) I checked + 2) the issues was gone as soon as I reset the exturder 3) It does not explain why T reading was frozen

Could it be any issues with the T reading code, maybe simply not called or terminated/exited prematurely? Maybe something wrong with the related interrupt (if any).

What about implementing a T freeze check (in a different interrupt / part of the code), as suggested above?

pillopaolo commented 2 years ago

I forgot to say that the thermistor is connected to GND, If the heater GND is connected to thermistor GND, then it will stay heating.

Standard 100K ohm NTC 3950, since years in 50+ printer/extruders.

Still does not explain why Temperature reading is frozen

robbycandra commented 2 years ago

When it comes to stopping or freezing printers, I think my guess is more towards gcode reading. But this is only based on experience, don't have any proof. But I never experience any printer heating up. Just stop or freeze.

zeleps commented 2 years ago

@pillopaolo I wanted to take a look at temperature.cpp, knowing the sensor type eliminates some possible code paths.

I had something similar happening to me a few months ago, but it hasn't occurred since then (although I exclusively use octoprint to print stuff). Can you reproduce the issue? If yes, it would be interesting if you could enable some debug logging and try it again.

pillopaolo commented 2 years ago

@pillopaolo I wanted to take a look at temperature.cpp, knowing the sensor type eliminates some possible code paths.

I had something similar happening to me a few months ago, but it hasn't occurred since then (although I exclusively use octoprint to print stuff). Can you reproduce the issue? If yes, it would be interesting if you could enable some debug logging and try it again.

Thermistor type = 1 = Standard 100K ohm NTC 3950, since years in 50+ printer/extruders.

Unfortunately I cannot reproduce it. It only happened once (and was pretty bad!). Then I read other people had the same issue. Temperature freeze (while other things are functioning) is something very peculiar that could help in troubleshooting.

zeleps commented 2 years ago

Do you remember if bed temp was updating properly when the problem occured?

pillopaolo commented 2 years ago

Do you remember if bed temp was updating properly when the problem occured?

Bed was cold, so pretty constant. I did not pay attention.

thisiskeithb commented 2 years ago

https://github.com/MarlinFirmware/Marlin/pull/23373 has been merged.

descipher commented 2 years ago

Does the runaway occur when bang bang is used instead of PID?

zeleps commented 2 years ago

No, @zenturacp's case (as well as mine) are PID setups (both for hotend and bed).

descipher commented 2 years ago

No, @zenturacp's case (as well as mine) are PID setups (both for hotend and bed).

Just to verify, we have no reported incidents when using Bang Bang? We only see it when using PID so far.

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.