Thermal runaway audio-visual alarm

nophead commented 9 years ago

While answering a forum question I came across this line: https://github.com/MarlinFirmware/Marlin/blob/7866fa161f46e57fae1d9a82df4927de5e6dc6bb/Marlin/temperature.cpp#L1140

if (temperature >= (target_temperature - hysteresis_degc))
  //reset the timer

It seems to me is should be: -

if (temperature >= (target_temperature - hysteresis_degc) && 
    temperature <= (target_temperature + hysteresis_degc))

alexborro commented 9 years ago

@nophead , the idea is to keep the timer counting IF the current temperature is bellow the target. I mean, if the system takes longer than "period_seconds" to make the temperature reach the target, probably something is wrong and the system should be halted.

In your proposal, the timer will increment even if the temperature is higher than hysteresis, which makes no sense to me. We are trying to catch a situation where the heater is ON for a long time without hitting the target. If the temperature is above target+hysteresis, the heater should be OFF - maybe just a misconfigured PID and presents no danger to the system.

If I misunderstood your proposal, let me know.

Cheers.

Alex.

nophead commented 9 years ago

If it is above the target for a long time then that implies the heater is stuck on, which is classic thermal runaway in my book.

nophead commented 9 years ago

Also too low could be thermistor fallen off but it could also be fan cooling hot end too much. In that case thermal runaway is a confusing error message.

alexborro commented 9 years ago

Chris, if the temperature is above the target and the system can still read the temperature, it will take measures to turn the heater off. In Thermal Runaway Protection we are trying to take care of some hardware issues and the most common issue is a thermistor coming off place. In such situation the measured temperature will be below the real one so the heater will take longer (or never) to reach the target.

About the cooling fan, the user need to calibrate correctly the system. In my experiences, even a good cooling fan turned on at max power will not prevent a 40W heater to reach the target in a few seconds.

Bear in mind this is not a feature to prevent 100% of the failure cases, just one more redundancy trying to prevent damage.

Cheers.

Alex.

galexander1 commented 9 years ago

@nophead @alexborro I think the question is whether any temperature sensors will provide a high reading on failure? I see a few different failure modes...

pop out of heater block: it will read room temp (~20C) -- FAIL LOW
If a thermistor disconnects (electrical open), the 4.7k pullup will pull it to 5V (or 3.3V or whatever), which will make it into analog2temp() as raw==0xffff or similar, and it will return the last value in the table which is usually 0C -- FAIL LOW
If the sensor shorts, I think it will see 0V, analog2temp() will see raw==0, and it will return the first entry in the table (typically high max) -- FAIL HIGH

That's with your typical NTC thermistor. Disconnecting seems more common than shorting, but shorting isn't impossible. And if you used a PTC thermistor (none supported yet?), the short and disconnect scenarios would be reversed. I don't know anything about thermocouples... And of course my analysis of NTC thermistors could be wrong.

So I think @nophead's suggestion is a good idea. Sustained high readings can indicate a thermistor failure that should shut down the robot.

It would be kind of neat if analog2temp() would set a specific failure value (0C/1000C?) instead of just using the first/last entries in the table, to make sure that off-scale readings don't look like max-scale readings.

galexander1 commented 9 years ago

@alexborro said:

Chris, if the temperature is above the target and the system can still read the temperature, it will take measures to turn the heater off.

Oh. Right. :) Context is everything! Please disregard my previous message.

I think off-scale high is harmless because if the firmware can turn off the heater, it already will. And if it can't, well, what are we gonna do?!

nophead commented 9 years ago

When I said "heater stuck on2 I meant when the MOSFET is shorted due to ESD strike punching through the gate oxide. Yes the firmware will turn it off but if the temperature is still rising when the power is off that is thermal runaway and the user should be alerted even if it can't be shut down.

galexander1 commented 9 years ago

If I'm sitting there and that happens, I'll probably smell it long before I happen to look at the G-code console. :)

And if I'm not sitting there....bye bye hotend :(

On Tue, Feb 17, 2015 at 06:19:22AM -0800, Chris wrote:

When I said "heater stuck on2 I meant when the MOSFET is shorted due to ESD strike punching through the gate oxide. Yes the firmware will turn it off but if the temperature is still rising when the power is off that is thermal runaway and the user should be alerted even if it can't be shut down.

nophead commented 9 years ago

The noticeable thing would be the machine stops printing before the hot end starts to smoke. If the PSU is controllable then it could be switched off.

You seem happy to have a "thermal runaway protection" that ignores the most classic interpretation of thermal runaway i.e. getting too hot, for the sake of not adding a second clause to an expression.

It seems wrong to me but as I don't use it myself I don't really care. I use beefy MOSFETs and since I started cementing in my thermistors I have never had a case of thermal runaway. I also don't use 40W heaters, so the chance of a fire is small enough for me to not worry about it.

galexander1 commented 9 years ago

Well from looking at how my robot is constructed, I can imagine the tape coming undone and the thermistor popping out. It also seems likely that the in-line connector in the thermistor wires will someday flake out and open (read low). When I built it, I feared those failures. I think experience has shown those to be common failure modes for all 3d printers. But the code that is there seems to handle either of those situations.

I don't know how common this kind of FET failure is. But if the FET does fail to a hard short, it seems like it will be heating from the instant I power up the robot, even if I don't set M104/M109 at all. At that point, it will be state==0 and that "if" won't even be evaluated.

Also, if you make the proposed change, and someone issues an M109 S200, and then issues an M104 S180, the robot will immediately go into shutdown, because the state machine does not anticipate decreasing temperature. So it is not quite so simple as adding the extra clause to the if -- we have to redesign (and re-test) the state machine as well. That's not hard, but we have to weigh added complexity against the probability of the failure mode.

If the failure mode was particularly common, I'd support some changes to deal with it, and I'd also rewire my robot so it can switch off its own PSU. But I think the failure mode is very very rare, and anyways I already own a PSU that can't switch off. :)

nophead commented 9 years ago

If you reduce the temperature setting it won't immediately error because it waits for the time specified to elapse. It is just the same situation as if you increase the temperature setting now. I really needs to go back to state 1 when the set point changes and move to state 2 when it crosses the set point.

Most failures are caused by using tape or silicone to fix thermistor. It should be cemented or screwed in. If the connector fails you should get a MINTEMP error and if it shorts you get MAXTEMP.

The most common type of failure with a MOSFET is to go short circuit though ESD damage, un-suppressed transients or getting too hot due to not being switched fully on. Looks like one here: http://forums.reprap.org/read.php?13,471647,471647#msg-471647. Note this is a cheap Chinese Melzi. The ones I supply have very sturdy MOSFETs and I have never know one fail.

daid commented 9 years ago

Problem with this whole protection is that it doesn't really protect against all possible failure cases. Not even after the changes proposed by nophead.

If you want to do it right, you need to analyze every possible "single failure" scenario. And be sure to have protection against that.

One of the cases where all protection falls short is where the heater moves out of the hotend. This can happen in rare cases where the heater is not properly secured, but then poses a high risk, as this is only detected during heating up, not during actual printing.

galexander1 commented 9 years ago

@daid I agree real safety would require such a fault analysis.

I think the current code happens to catch a detached heater, though. If the heater resistor moves so far away from the heater block that it can't effectively heat the block, then the thermistor value will drop even if the heater is on full, and it will (eventually) trigger the runaway protection for that reason. So you just hope it didn't ignite anything while it was dangling before the protection triggered (apparently, default 40 seconds).

I'm not eager to perform the experiment, but I wonder if a typical heater resistor can ignite PLA/ABS (or a solid acrylic bed), or if it will just reduce it to molten slag. Autoignition is supposed to be around 380C for PLA, hard for me to know whether a naked resistor could reach that... Flames might spread but slag will just stain the top of my workbench.

Perfect safety is unachievable because there are a few substances which will ignite even within the regular operating range, especially if you are using high temperature filaments like nylon. If a volatile solvent spills on your running printer, you might be in big trouble and there's nothing the firmware can do...

daid commented 9 years ago

The 25W heater found in the UM2 and the 40W heater in the UMO can both ignite PLA. Could not manage to ignite ABS with the UM2 heater.

But I don't think it will trigger the protection implemented here, as it will stay stay stuck in state 1.

galexander1 commented 9 years ago

@daid Good point. I noticed that it would stay in state 1, but didn't consider it a problem because I am always present when I start a print, I am only worried about if something fails in the middle of the print.

I wonder if it is worth remedying though? It would be pretty easy to add a limit for how long it is allowed to stay in state 1. 120 seconds would certainly be enough for my printer to get up to temp..

And thanks for the report about ignition. :)

alexborro commented 9 years ago

@nophead , your point is valid and good. A shorted MOSFET is not a common issue but happens - actually I had some in my life. And this proposed change will warn about it and, if the board is controlling the PS, shut the system down - I usually have a SSR powering up my systems, so it's easy to shut the whole system down.

@daid , as I said before, this is one more redundancy trying to avoid any danger. There is not even a single 100%-safe-system in whole world; otherwise we will have no more airplane crashes. No matter how many safety devices you have protecting a system, there will be always a way to crash.. I just don't know why people blame protection devices/routines just because they cannot save the system in 100% of the cases.. I'm happy saving just one house from burning up in flames instead of keeping years planning "The Great Safety Device" that never come out from the paper.

@galexander1 , we can add a time limit for the initial heating as well, that is pretty easy indeed. But people will mess it around. They get confused in the way it is today, I wonder if we make it a little more complex.. they will just turn it off. People - including me - watch a least the first layer and then leave the printer alone for hours.. so it is not a big deal.

Cheers.

Alex.

daid commented 9 years ago

@alexborro You cannot have 100% safety. But, in some industries (I come from traffic light systems & safety) you get damn close. What you do is a "cause and effect analyse", to see what happens if a single component fails. If the failure causes a critical problem (in this case, burning printing) you should have some detection mechanism in place. Then, on all the single errors you do not detect, you check what happens if you combine those with other single errors that you don't detect. If a combination there also causes a critical problem, you once again add some form of protection.

However, due to all the hardware variations in RepRap land, this is difficult.

And, a burned down house is a critical failure. Failed print is a minor failure. (And there are a few reported cases of burned down printers, and a reported case of a burned down house which quite possibly was the printer)

alexborro commented 9 years ago

@daid , you have got my point. Many users assemble their own machine and will not spend more money on safety devices like smoke detectors, redundant thermometers, etc. On the other side, software is free, they just enable a feature. Of course it cannot protect all cases, but it's free and can protect some cases. A friend of mine had a room in his house burned due a thermistor coming off place. For God sake just the stuffs in the room got burned - the houses in my country are made of ceramic bricks, not wood like US. But it was a considerable damage and motivated me doing this feature - it could avoid such pain.

If you guys have new ideas to improve it, let me know. Soon I will implement the change proposed by @nophead .

Cheers.

Alex.

daid commented 9 years ago

@alexborro The amount of machines sold with Marlin on it (as full machine) dwarfs the amount of self build machines.

For the UM2, I look at the pidoutput, and check if that's on full power for a long time. If it's on for a long time, and do not see a temperature increase, then something must be wrong. This catches a lot of cases.

msutas commented 9 years ago

I prefer the thermal runaway to be as it is right now. My printers have a 750 watt bed heater and the thermistor placed on the side of the bed. It overshoots the target temperature by 10-15 degrees in bangbang control. This means for the bed in order not to have a false thermal runaway alert, I need to set a wide gap for thermal runaway which increases the risk by late alerting when the thermistoris loose.

The controller stops cooling if the temperature is above the target. When there is a problem with the mosfet throwing a thermal runaway alert, it would not stop the heater and there will not be any benefit if the robot was left alone during print. If the operator is frequently checking the print, the mosfet fault should be noticed anyway.

If the change is agreed on, I belive it would be benefical to have separate limits for thermal upward and downward runaway on configuration.

nophead commented 9 years ago

I don't think it checks the bed for runaway, just the extruder heaters.

If you have bed that is not self limiting you should put a thermal cutout in series with it.

On 18 February 2015 at 13:05, Mehmet Sutas notifications@github.com wrote:

I prefer the thermal runaway to be as it is right now. My printers have a 750 watt bed heater and the thermistor placed on the side of the bed. It overshoots the target temperature by 10-15 degrees in bangbang control. This means for the bed in order not to have a false thermal runaway alert, I need to set a wide gap for thermal runaway which increases the risk by late alerting when the thermistoris loose.

The controller stops cooling if the temperature is above the target. When there is a problem with the mosfet throwing a thermal runaway alert, it would not stop the heater and there will not be any benefit if the robot was left alone during print. If the operator is frequently checking the print, the mosfet fault should be noticed anyway.

If the change is agreed on, I belive it would be benefical to have separate limits for thermal upward and downward runaway on configuration.

Reply to this email directly or view it on GitHub https://github.com/MarlinFirmware/Marlin/issues/1509#issuecomment-74860505 .

alexborro commented 9 years ago

@msutas Bear in mind there is also a period of time the temperature needs to be over the threshold. I usually set 40 seconds in my printers. I think is quite difficult your bed to stay 10ºC over the target for 40 seconds.. check it out.. I usually set my bed to 110ºC for ABS.. and if I turn it off, it drops to 100ºC within 20 seconds.

Cheers.

Alex.

galexander1 commented 9 years ago

FWIW, it does check the beds. THERMAL_RUNAWAY_PROTECTION_BED_PERIOD THERMAL_RUNAWAY_PROTECTION_BED_HYSTERESIS

thinkyhead commented 9 years ago

Who's for adding an audible "fire alarm" to the LCD code that will go off in cases of bad thermal runaway?

CONSULitAS commented 9 years ago

:+1: :8ball:

avluis commented 9 years ago

@thinkyhead Was just thinking about this, use the buzzer (piezo speaker, whatever the LCD has) as the alarm when thermal runaway protection has been triggered.

ntoff commented 9 years ago

Who's for adding an audible "fire alarm" to the LCD code that will go off in cases of bad thermal runaway?

Why not just an alarm that goes off whenever there's any kind of error at all?

Also for the high temperature detection like the mosfet is stuck on, what good will just detecting that do? If the mosfet is stuck on then there will be no way for the printer to do anything other than alert the user through a message so detecting it would mean very little. The only situation I can think of would be to wire up an ATX PSU and have the RAMPS board able to cut power to it so it just shuts the printer down entirely.

thinkyhead commented 9 years ago

@ntoff Something like a "dead man's switch" …

clefranc commented 9 years ago

@thinkyhead Yes! Trigger the buzzer please... @ntoff Yes! Shutdown the PSU too...

http://wavs.unclebubby.com/wav/TREK/Computer/audesarm.wav

thinkyhead commented 9 years ago

For users lacking an LCD controller, perhaps we can blink the status LED on the electronics board in some attention-grabbing way.

avluis commented 9 years ago

@thinkyhead LED sounds good for those without an LCD, but in most cases I believe the electronics would be mounted on the frame, which for some, makes the LED not too visible. How about if the user has fans, then blasting them to full then low and repeat - moving the X & Y axis 5 - 10mm in both directions and repeat, etc. Something that would really make someone PANIC! Will definately get their attention and will know right away something is wrong.

thinkyhead commented 9 years ago

Haha. We could use motor ringing vibrations to play a tune on one of the axis motors... The Imperial March from Star Wars, I suppose...

daid commented 9 years ago

Would you really want to risk shaking a machine apart and people reporting errors like "my machine is shaking, your code is broken!1!"?

thinkyhead commented 9 years ago

@daid Haha, no, truly! I should be more clear that I don't favor abusing the motors in this way. We should signal in all the appropriate ways, though - audible if possible, a message to the LCD, and a distinct error message to the Serial Out so the host can respond.

avluis commented 9 years ago

@thinkyhead Actually, can hosts play audio if requested via Serial? This could be the way to go for those with a computer system connected to their printer. I know Repetier will play audio where requested on the G-Code but none that I know of that can reproduce sound on demand from the printer - this could be a good way to have an alarm~

thinkyhead commented 9 years ago

@avluis I expect that, more likely, we will expand the Serial Protocol (and document it) to include a distinct message for a thermal shutdown situation, with a recommendation to alert users with an audible signal, and then hosts can choose the sound they prefer.

ntoff commented 9 years ago

@thinkyhead My RC car used to play a tune using the motor whenever it started up. You'd count the beeps and things. Surely some high pitched noises wouldn't be that bad? Or aren't the Atmel / driver chips fast enough for higher frequencies?

ZetaPhoenix commented 9 years ago

It can be done with an Arduino. http://youtu.be/R6ktnQ4h8NY

Changing direction can each pulse can allow a frequency to be generated without significant rotation.

-Jon

thinkyhead commented 9 years ago

@ZetaPhoenix I guess we have to come up with a consensus on how much rotation would be acceptable, and which axis to favor. Users flash their own firmware and choose their own options, so presumably they will be informed about the potential for this to occur. And then, well, it would be "cute" to include this as a general feature - an alternative version of the "beep" command for units with no controller speaker.

ntoff commented 9 years ago

I was only thinking of it in terms of some kind of emergency alert thing, rather than a beeper replacement. I would think the Z axis motors would be the best choice since they're almost guaranteed to be screwed to the frame which would amplify the sounds they made wouldn't it? Plus if you were printing and watching it on a web cam and saw the Z axis had moved up 10mm away from the print before playing the alert sound, it would also be a visual indicator that something had gone wrong.

You mentioned blinking LED's but some printers are enclosed with the electronics not so visible and I'm so used to seeing the LED's blink because of PID, I'd probably just ignore a blinking LED thinking it was just the PID doing its job.

thinkyhead commented 9 years ago

@ntoff Well, any blinking would be some obvious regular flashing pattern, 3-4 Hz.

boelle commented 9 years ago

I will close this one.... there is no 100% way to get arround this problem example:

i start an 24 hour or more print (yes i have done it a few times), even though the LCD panel (which i don't have) starts to flash and blink and beep like hell, there is no way that will wake me up across the flat through 2 closed doors anyway.

the only "close to" 100% is to fit a thermal fuse that breaks the connection at say 247 degrees where the PEEK would start to soften. or even lower as i rarely go above 190 anyway.

another reason is that it will just add to the complexity and what we have is enough for most cases

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

MarlinFirmware / Marlin

Thermal runaway audio-visual alarm #1509