MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.09k stars 19.2k forks source link

Fatal: Marlin has no way of knowing if thermistor is disconnected #258

Closed tenaciousRas closed 10 years ago

tenaciousRas commented 11 years ago

Expected: if the thermistor is physically removed from the hotend (due to an accident) while printing, Sprinter should be able to recognize that even though power is being applied the temperature readings are not acting as they should because there's been a hardware failure. In other words, Sprinter has no way of avoiding or detecting this fatal scenario. Once this event occurs it is likely the hotend will be destroyed by overheating, or worse could occur to the user.

Actual: Sprinter will let you burn your hotend off the carriage, and if you're really neglectful it could become a fire hazard to you, your family, and/or possessions.

Since Marlin is based off Sprinter, both of these firmwares expose this critical issue. Please fix this or ask the community to submit a patch.

tenaciousRas commented 11 years ago

errr....looking at Marlin's architecture...there isn't a good way to fix this other than to hack in a timer-based feature. Is this assertion missing something?

ErikZalm commented 11 years ago

Submit a patch. Or make a hardware protection. (Thermostat in series with the heater)

There is something in that check the connection during startup. Not during printing.

ErikZalm commented 11 years ago

I can see if it is possible to check if the power is on max power for x min. But this could give false triggers during startup (Heating from low temperature to 200C). If x needs to be to high then the protection is not good enough.

For good protection a thermal model is needed. But who is going to make that for all different hotends? Maybe it can be extracted from the PID constants.

If this was easy I had it implemented a long time ago.

nophead commented 11 years ago

Easiest thing is to use a heater that doesn't have enough power to destroy anything. My heaters max out below 300C. If not you should have a thermal fuse in series. Firmware can't be used as a safety feature. It would not pass any safety regulations.

On 3 October 2012 20:04, ErikZalm notifications@github.com wrote:

I can see if it is possible to check if the power is on max power for x min. But this could give false triggers during startup (Heating from low temperature to 200C). If x needs to be to high then the protection is not good enough.

For good protection a thermal model is needed. But who is going to make that for all different hotends? Maybe it can be extracted from the PID constants.

If this was easy I had it implemented a long time ago.

— Reply to this email directly or view it on GitHubhttps://github.com/ErikZalm/Marlin/issues/258#issuecomment-9118133.

tenaciousRas commented 11 years ago

I really like that hardware idea. I just had a lengthy banter with Kliment about this. It took a while but we finally got into talking about the code implementation. Like nophead's indicating, there's a watch timer that's supposed to address the issue but apparently it doesn't work (false triggers) for some unspecified segment of the hardware population. I'm just glad you guys don't respond with "closed - not an issue". Makes a big difference now we at least have community review and the author can push back on users like me.

It seems to me there should be a way to parametrize the trigger so that, if it's giving me false triggers I can adjust it to eliminate that, and have the safety feature. That's what I was thinking of looking into for a patch. The hardware idea might be cooler I'm gonna bounce that off a couple people too.

tenaciousRas commented 11 years ago

Actually that might sound goofy b/c I think the config for the watch timer is already similar to what I just described. I know the feature is "train stops". I see the parameterization is already on the watch timeframe. The watch timer algorithm looks wonky to me. It's always checking the watch timer and it's checking against an increase of 1? Maybe I'm reading that incorrectly. I want a state variable that it sets so loop() just falls out of doing anything else. I want the train stopped until the operator resets it.

tenaciousRas commented 11 years ago

Using an interrupt (if any are left) might do the trick and I might leave it enabled if the train stops...or could disable it. I mean seriously this event is a shutdown event if it's properly understood in firmware. Just thinking aloud...

tenaciousRas commented 11 years ago

OK - the feature is indeed entirely broken. There's a typo in a conditional in setwatch thus 1) setWatch is broken. 2) setWatch is broken for multiple extruders. 3) manage_heaters handles the check for watchperiod -- it's broken for multiple extruders. 4) Marlin.c only checks on M104 or M109 -- then manage_heaters stops the check once the target is reached. Look...this lack of separation of concerns is throughout Marlin and Sprinter. Adding a PrinterState class would help much of it -- and probably facilitate more robust contributions from the public. I digress...my point on #4 is -- the feature needs to be running constantly if I turn it on -- and imo it should be on by default. I don't see what this talk about false triggers is...what is this nonsense? The default watchperiod is 20 seconds -- so what am I missing how are the false triggers being reported on this feature? Could it be...the typo?

tenaciousRas commented 11 years ago

The patch is being tested on printed parts. Yet...how can I submit the patch to you? Github doesn't seem to let me message you, or did they hide it? My fork is on Bitbucket and that's where I operate b/c free-of-charge managed private git repos are better than ones that cost money.

This feature will save $50-$100/pc hotends and possibly prevent one or two fire risks. Unfortunately, the HBP has no implementation for a watch timer. I didn't add a PrinterState class b/c it might hinder acceptance of the patch. There seems to be another bug in the timer comparison with a missing (long) cast.

The fix basically works like: "if the heater's on and X seconds have passed, then has the temperature gone up by Y? if yes set temp to 0; if no check again in X millis" where X and Y are config'ed in configuration_adv.h.

I put more thought into the thermostat concept...and I was wrong it's not so cool. The problem here is rate of measured temp. change in a time period - and expecting something about that measurement. A thermostat is not a component well suited to this task. Aren't many digital thermostats built around thermistors or digital resistors? Does the solution introduce more components that could fail? Does that solution still need a firmware backup?

How do I send this patch to you - or pull from Bitbucket?

tenaciousRas commented 11 years ago

it's at https://bitbucket.org/terawattindustries/marlin-bt/src/b55715c37ab8 - should be just my commit on that branch against the latest from the master branch here - lemme know otherwise

martinxyz commented 11 years ago

Why do you use a private repository? I'm interested to see the patch, but I first needed to create an account at bitbucket, then was told "access denied". Can't you just clone the project on gitup and push your patch to the public, or use some pastebin site and paste the output of "git format-patch" in plaintext? Anything that anyone can look at without registering anywhere...

tenaciousRas commented 11 years ago

OK my code works for a while then breaks. Gathering from observation the entire algo. needs to be smarter and rewritten. It's not that we want to say, "if time has elapsed the temp should be within this delta". We want to say, "if time has elapsed the rate of change of the temperature should have a certain characteristic". Specifically if the temp was falling when we applied power, the rate of decrease should slow, halt, then become negative (temp increase). If the temp was rising when we applied power we expect to see the rate of change be constant -- depending on the power applied. I'm going to need more insight into the PID implementation and more dirt on my hands.

tenaciousRas commented 11 years ago

@Martinxyz I'm referring to the fact that Bitbucket offers private repos for $0. Did you think I meant my patch was in a private repo? Github wants a monthly fee for the same thing. I think 90% of my repos on Bitbucket are public, but I like the 100% free concept...seems more open-source to me.

I'm rewriting the patch. No silly wabbit that won't work. And getting the 2nd derivative from data points on this here 2560 should be interesting.

nophead commented 11 years ago

Github is free for opensource projects

On 5 October 2012 15:18, Free notifications@github.com wrote:

@martinxyz https://github.com/martinxyz I'm referring to the fact that Bitbucket offers private repos for $0. Did you think I meant my patch was in a private repo? Github wants a monthly fee for the same thing. I think 90% of my repos on Bitbucket are public, but I like the 100% free concept...seems more open-source to me.

I'm rewriting the patch. No silly wabbit that won't work. And getting the 2nd derivative from data points on this here 2560 should be interesting.

— Reply to this email directly or view it on GitHubhttps://github.com/ErikZalm/Marlin/issues/258#issuecomment-9176971.

tenaciousRas commented 11 years ago

OK hijackers...bitbucket is too, do you work for Github?

It looks like http://compmath.files.wordpress.com/2010/12/slaliberte_freportfall10.pdf explains a good method to get a fourier approximation of a line. I'm looking at applying the algo in chapter 3. to a small set of data points that are the delta of temps between readings. Any better ideas?

tenaciousRas commented 11 years ago

b/c I'm skeptical that's feasible on a MEGA and drive 5 steppers...

martinxyz commented 11 years ago

@tenaciousRas I understand if you prefer Bitbucket. But I can't look at your patch. The link you have posted seems to point to a private place. Can you please make it public?

About thermocouples that slip out of the hot-end, I think what you need is a basic physical model of how the heater is coupled to the measurement point (assuming the worst sensible coupling). I will try to model my hot-end (it's a fun challenge) and tell you if I get to any conclusions. Fourier transform sounds too complex to me. This kind of problem is usually modelled with a leaky integrator or two, which is just a few additions and multiplications to implement.

martinxyz commented 11 years ago

Okay, according to my model (which I have fitted to actual power and temp output) my ultimaker hotend could reach about 650 degrees. Eek :-) Setting PID_MAX in Configuration.h to about 40% of the full value (temperature maxes out at 300 degrees) would be the simplest and most pragmatic way to prevent this. This would increase the initial heatup time from 2min to 5min. Of course, it's hot-end specific.

Failure during the initial heatup (from a cold start) seems the easiest to detect, we can make some sound assumptions here: if it does not reach 50 degrees after one minute of full-power, then the hot-end is not just under-powered but broken - stop trying. I think it would take at least 3 minutes to overheat the UM hot-end from a cold start.

Still thinking about a more fancy solution during printing, I think a power limit is probably the way to go (but dynamic, increasing the limit during setpoint changes). If needed, the most important model parameters are not hard to estimate: the heater power during initial heatup, and the isolation towards room temperature while holding a fixed temperature for a while (however that can change significantly when turning on/off a fan).

tenaciousRas commented 11 years ago

My bad private is the default on BB and thought I set that up right.

I agree FFT is probably out of scope for this MCU. I did at least a couple/few more hours of research after that post. There is certainly evidence for signal analysis of thermistor data to handle similar (or the same) problem in other machines. Unfortunately I couldn't find anything that was shared or open. My investigations led me to look at Thiel-Sen approximations and how to opimize them for 8-bit MCUs. The journal article I found that describes an algorithm to do this in O(n log n) contains some advanced mathematical jargon (about permutations) that I've forgotten how to dissect. I'd need some help to translate that into code, or it will take me extra time. I stumbled on another article that's even more specific about Thiel Sen approximations in fixed-memory spaces, with jargon that's just as advanced. It claimed that certain sets of Thiel Sen problems can be handled in O(n (log n)^0.5). Unfortunately I lost track of that paper...but either way it's a hill for me to climb to understand the description of the algorithm so that I could code one up!

tenaciousRas commented 11 years ago

Thanks for looking into the power limit ideas. Waiting longer for startup wouldn't be fun but I'd gladly trade. For the record though one of my biggest problems with the watch timer is that it stops once target is reached. Since the hotend is more likely to be dislodged while printing, only having this feature during heatup is like not having it at all.

You're right dealing with startup is much easier. Once target is reached - then my patch fails. For the record the typo I'vem mentioned should mean the feature never worked as expected.

You're saying once we get up to target -- just check to make sure we never fall too far back towards room temp, right? Methinks that's better than my idea!

tenaciousRas commented 11 years ago

First I like your proposal. My patch is at https://bitbucket.org/terawattindustries/marlin-bt/src/ce423fc66337 - on the extruder-watch-fix branch. I was able to adjust the WATCH_PERIOD and WATCH_PERIOD_HYSTERESIS to avoid a false trigger on startup, and get to the target temp repeatedly. Values like 10 or 20 seconds and 0.5 or 1 degree seemed to do it, though I had to play around to find those.

Your idea should work in most circumstances for this problem. One where it doesn't work is if the thermistor is detached yet close enough to stay the same temp or get slowly hotter as the hotend is driven. I have no idea how often that scenario occurs within the one reported here. Still, I like it b/c it's an improvement over the current situation. In order to add this type of solution we need to know that target_hotend[e] has been reached, and that it's time to watch for weird drops. There's no support for that type of thing now, but it's just a matter of adding a boolean for each extruder. I'm going to look into this today and tomorrow - thanks much!

nothinman commented 11 years ago

Kinda makes me wonder... if you're so paranoid about your thermistor failing... why can't you use two thermistors? They cost peanuts and there's plenty of ADC inputs left on the micro.

martinxyz commented 11 years ago

@tenaciousRas I didn't really suggest a solution yet. Just some more simulation insight: when printing at 220 degrees and the heater suddenly turns always-on, you have not much time to detect and fix the siutation. After 60 seconds my hot-end would reach 300 degrees, after another 30 seconds it would pass the melting point of PEEK.

The worst scenario would be when the thermocouple slips out just a little bit more and more while printing. It is almost impossible for the firmware to distinguish this from a fan that suddenly turns on and cools the hot-end.

If someone is interested, I put my simulation stuff online: https://github.com/martinxyz/um-hotend-sim . In the measured data, the fan turns on after 7 minutes. I was surprised that a little bit of air movement makes such a big impact. This pretty much makes it impossible to have a model good enough for failure detection.

I have also implemented a heater power limit (that still allows bursts of 30sec full-power) in the simulation. This could be implemented inside the control loop. The problem is, 30sec full-power is not enough for a fast initial heatup. My solution would be to add a little bit additional energy to the pool whenever the temperature is actually moving up towards the target temperature. Not really an elegant solution, because it needs two parameters set by the user - I was hoping for zero parameters.

martinxyz commented 11 years ago

@nothinman Just for the record, I'm not that worried, I think my thermistor is firmly screwed up. I'm not so sure about the wire connection, so I'm really glad to know that the firmware will most likely stop heating already if there is a wiring issue. Still, I prefer to know that the software wouldn't stupidly heat things up if there is something obviously wrong. But it seems to be more involved than I thought initially.

AndrewDiehl commented 11 years ago

How about a two part approach?

We use the "cold extrusion prevented' as the basis for determining a disconnected heat sensor during a print. If the temp drops below that, it's relatively safe to assume something is wrong with the sensor or heater and everything gets shut down.

For M109, M190, we have a "max time" setting before temp is achieved. ie: if your hot end usually takes 4 mins to heat up, and It's now been 6 minutes since M109 was called, shut everything down.

While this doesn't fix "partially loose" heating sensors, it certainly is better than what we currently have.

NeilRG commented 10 years ago

Hi. I am a noob to GitHub. I signed up just to connect with this discussion. To quote Steve Ciarcia of Circuit Cellar fame 'solder is my favorite programming language' . I've been thinking about a flexible low cost hardware solution for a thermal interlock that does not rely on software, I am researching US Sensors PPG102C which is rated to 500 deg. c the attractiveness of this part is that coupled to a window comparator you could place the sensor in a convenient place on your hot end and then set the trip point. The window comparator could drive a "crowbar" scr to blow a power supply fuse, or set a pin state on the controller.

tmilker commented 10 years ago

Thanks for your comment @NeilRG but it's not really relevant to an issue with Marlin.

NeilRG commented 10 years ago

I apologize for crossing the thread and posting off the root topic.It had not occurred to me at the time. I was trying to offer a solution based on my perception that implementing a solution in software given the available processing horsepower had been elusive so far.

Respectfully,

NeilRG

On Sun, Feb 9, 2014 at 6:09 PM, tmilker notifications@github.com wrote:

Thanks for your comment @NeilRG https://github.com/NeilRG but it's not really relevant to an issue with Marlin.

Reply to this email directly or view it on GitHubhttps://github.com/ErikZalm/Marlin/issues/258#issuecomment-34590648 .

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.