MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.28k stars 19.23k forks source link

[BUG] "Heating Failed" after PID takes over #21661

Closed Thinkersbluff closed 2 years ago

Thinkersbluff commented 3 years ago

Did you test the latest bugfix-2.0.x code?

Yes, and the problem still exists.

Bug Description

This bug is present in the CR6-SE Community Firmware at Release 6, which incorporates Marlin bugfix2.0. I am running this particular version, compiled for my printer's hardware configuration: CF6.1-Pre2-btt-skr-cr6-with-stock-creality-tft-2021-04-18-22-12.zip

I reported this bug on the Community Firmware GitHub Issue#248, but the CF6 developer has asked me to report it upstream, here.

Description: The nozzle temperature sometimes fails to reach the target value when heating. When this is happening, the temperature may climb to within a few degrees of target, but then drops again, cycling around a center value approximately 10 degrees below target. Eventually, the system throws a "Heating Failed failed to achieve target temperature within the alloted timeframe" message on the screen and "kills" the job, forcing the user to cycle power to recover.

wtf wtf.txt

Bug Timeline

Issues with PID not performing as well as in the past have been reported by several users since Release 6 of the Community Firmware.

Expected behavior

Nozzle should heat to the target temperature and stabilize. Particularly if the system has just had a PID run (M303 E0 Sxxx U1) at the same target temperature with no problem, yet now can not heat to that xxx temperature to print.

Actual behavior

When this problem occurs, the serial interface shows that the printer recognizes the correct target temperature, yet it stops short of heating to that value, cycling instead around a value about 10 deg C lower than the target.

NOTE: These users on the Creality CR6SE/MAX Official Facebook Group describe this same problem in other scenarios, so it is not specifically or uniquely an issue when heating to 235C or when running an esteps extrusion.

I have also been able to successfully achieve 230C when I could not achieve 235C and I have achieved 235C when the nozzle was already at 230C, so there are other parameters at play, here, that I have not yet isolated. The part cooling fan was off, the whole time.

In the final cycle of the second graph (see my comment below this post), the printer actually bumped-up from 230 to 235 just at the end, there. No idea why, as you can see it was settling at the lower value & I touched nothing.

Steps to Reproduce

I was able to reproduce this problem fairly consistently as follows:

  1. Connect Octoprint to the printer & monitor the serial interface and the Temperature plot to observe what happens
  2. Use the CF6 PID function to PID the nozzle at 235C
  3. Wait for the nozzle to cool to 145C (or set the temperature to 145C, to let it stabilize there.)
  4. Use the CF6 esteps function to run a calibration extrusion at 235C.
  5. The printer recognizes and reports on the serial interface that the target temperature is 235
  6. The printer stops short of 235 when heating, instead cooling again around 230C
  7. The printer then cycles around approx 225 +/-5 degrees, never trying to achieve 235, as seen in the attached logs.
  8. If left long enough (sorry, did not time it), the printer throws a Heating Failed alert and kills the process.

Version of Marlin Firmware

Latest Bugfix2.0 merged into Community FIrmware Release 6.1 Pre 2 on 18 April 2021

Printer model

Creality CR6-SE

Electronics

BTT SKR CR6 motherboard, stock hotend, stock cooling fans, stock TFT. Users with Creality 4.5.3 boards also report this issue.

Add-ons

None

Your Slicer

Cura

Host Software

OctoPrint

Thinkersbluff commented 3 years ago

I would recommend against using bang-bang heating for the bed because it tends to produce visible artifacts in the print.

This is very helpful advice. Thank you. Do you happen to have photos or a link that explains how to recognize this type of artifact? I often see folks asking what causes some types of print artifacts but I have never seen this particular response as a possibility.

ManuelMcLure commented 3 years ago

Well, the comments in Configuration.h seem to imply that 0-127 is correct:

2805 // Incrementing this by 1 will double the software PWM frequency,
2806 // affecting heaters, and the fan if FAN_SOFT_PWM is enabled.
2807 // However, control resolution will be halved for each increment;
2808 // at zero value, there are 128 effective control positions.
2809 // :[0,1,2,3,4,5,6,7]
2810 #define SOFT_PWM_SCALE 0

Note the "128 effective control positions" bit.

However, I find https://github.com/MarlinFirmware/Marlin/blob/0c4085da01433c230731828a45ee7a91ae11b794/Marlin/src/module/temperature.cpp#L3019 suspect, since most often SOFT_PWM_SCALE will be 0 and that means that pwm_mask will end up with a value of 1 << -1. C considers the results of << as undefined if either operand is negative.

EDIT: never mind - the - is outside the _BV() call.

Sebazzz commented 3 years ago

For what it is worth, the PID values determined by the Marlin autotune for my E3D Hemera are:

M301 P51.5971 I9.3473 D71.2040

These values cause the temperature going up and down endlessly. However, the values below actually work and are stable:

M301 P34.9800 I3.8300 D79.9200

In both cases PID_FUNCTIONAL_RANGE is 25.

avolkov commented 3 years ago

@thinkyhead The command that saves to EEPROM that you gave fixes the issue -- M303 E-1 C8 S90 U

However, I don't think it is the same bug that Creality and BTT SKR 1.4 Turbo users were experiencing in the beginning of the thread.

I was experiencing the same issue with SKR 1.4 Turbo and Marlin 2.0.9.1, and I've tried fixes using ADC_LOWPASS_K_VALUE and PID_KI and PID_FUNCTIONAL_RANGE workaround settings. None of them worked.

I ran M501 and the values for bed Kp Ki Kd were all zero, even I defined them in Configuration.h; here's a sample output --

echo:  M301 P20.08 I1.30 D77.33
echo:  M304 P0.00 I0.00 D0.00

I think the bug here is that the values should be defaulting from the values defined in Configuration.h and not zeroed out. Also it seems it is currently not possible to load existing BEDPID values, it is only possible to write them using M303.

It could also be that SKR board doesn't honor the code that supposed to load the values.

I had the same issue with:

I suspect this is what users keep filing when they realize they can't use values in Configuration.h when EEPROM is enabled -- https://github.com/MarlinFirmware/Marlin/issues/12468

I'm happy to open a ticket based on this.

It seems all the comments starting on 2021 Jun 21 are referring to EEPROM read issue in 2.0.9.1 rather than PIDTEMP bug with Creality/SKR

ManuelMcLure commented 3 years ago

So, just to make sure we're on the same page, M501 will load existing values from EEPROM - it will completely ignore any values set in the firmware configuration files. You need to use M502 to copy the firmware values into RAM and M500 to save them back to EEPROM. Only then will M501 be able to load them back properly. Apologies if this is something you're already aware of, but I didn't see any mention of M502 in your comment so I want to make absolutely sure that's not the cause of your issues.

avolkov commented 3 years ago

@ManuelMcLure Thank you. I'm moving from marlin 1.1.9 when I only sparsely used EEPROM and I didn't realize I needed to use M502 to load values from firmware.

This seems to be a common misconception, maybe better wording in Configuration.h could alleviate the problem. Referring to Marlin 2.0.9.1, M502 mentions resetting to 'factory defaults' but doesn't mention that defaults also need to be loaded with M502

/**
 * EEPROM
 *
 * Persistent storage to preserve configurable settings across reboots.
 *
 *   M500 - Store settings to EEPROM.
 *   M501 - Read settings from EEPROM. (i.e., Throw away unsaved changes)
 *   M502 - Revert settings to "factory" defaults. (Follow with M500 to init the EEPROM.)
 */
ManuelMcLure commented 3 years ago

Yeah, it's a bit confusing. I always try to explain that there are three levels of configuration in Marlin - RAM, EEPROM, and firmware. RAM is what the printer will use at runtime. EEPROM is used to initialize the RAM settings on printer boot or if you use M501. Firmware settings will not override EEPROM settings unless:

This is done because there's no easy way to detect whether a value was changed in the configuration files if the EEPROM layout didn't change. If you (for example) have updated your Z probe offset and stored it to EEPROM, and then load a new version of Marlin where you forgot to change the Z probe in the configuration, we don't want to override the EEPROM value and possibly cause a nozzle crash.

github-actions[bot] commented 2 years ago

This issue has had no activity in the last 60 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 10 days.

bergie5737 commented 2 years ago

Hi I've recently upgraded my firmware to latest Marlin bugfix. I have a Wanhoa I3 clone. In my case my bed was still bang bang when my prints start to fail with "temperature error" as per the display. I replaced the thermocouple on my hotend as I thought that was the issue. I then added PID for the bed and now my bed stop 10C below target. When I do a PID tune, the bed temperature gets to the correct temperature. Any other way of heating stops 10C below target. I've increased PID functional range to 50, and it got worse. To me it appears the heater won't come on unless with 50 deg Celcius. :-). I am adding this as it seems the issue is not resolved. The hot end is always stable for me.

tombrazier commented 2 years ago

I might have some capacity to look into this bug. Is there anyone who is actively watching this and could do some testing? @Thinkersbluff maybe?

In the meantime, if anyone want to experiment with an alternative to PID, I have submitted a PR for model predictive control and I would like to hear back from others how it works for them. #23751

Thinkersbluff commented 2 years ago

I might have some capacity to look into this bug. Is there anyone who is actively watching this and could do some testing? @Thinkersbluff maybe?

Yes, I am monitoring this actively and yes, I can make time to run specific tests. I am an anal-retentive retired Engineer with just enough knowledge and experience to follow instructions rigorously, but I am not a programmer nd I have no access to an electronics lab or to exotic test equipment like oscilloscopes.

I have an Ender3 and a CR6-SE.
I have modified both printers to direct-drive, with all-metal hotends and 32-bit BTT motherboards. (SKR E3 Turbo on the Ender, SKR CR6 on the CR6). I use Octoprint on a Pi3b+ to remotely monitor and control each of the two printers.

I do use VSCode/Platformio to compile Marlin 2.x for the Ender, and I can do the same for the CR-6. The display only works if I use the Community Firmware on the CR-6, though, so some controlled experiments may only be possible in “headless mode” on that one. The Ender3 uses the original rotary knob/LCD controller.

I have the stock Type1 thermistors and aftermarket Eigweit 40W heater elements on both printers.

The CR-6 is currently fitted with a Trianglelabs DragonHF hotend, and I find the aftermarket heater is a marginal fit in that E3 V6 clone heater block. I had to crank the retention screw as tight as I could, to hold the element firmly and I cannot improve on that, right now. I added thermal paste to improve thermal coupling between heater element and heat block on both printers. The Ender thermistor is a glass bead type, with no thermal paste. The CR6 thermistor is a cartridge type, with thermal paste and a grub screw.

Both printers do seem to be working, with a small ripple on the extruder temperature but no evidence of this original issue on either machine, “unfortunately”. One part of any worthwhile experimentation may need to be figuring out how to destabilize the PID control again, before testing a “fix”?

I do have a PT1000 Type 47 thermistor available for the CR6 but not for the Ender. The BTT ADC on both printers does seem particularly vulnerable to EMI, and I got frustrated with its inability to stabilize nozzle temperature with that PT1000 thermistor installed so I rolled-back the mod.

How can I be of help?

tombrazier commented 2 years ago

Hmm. If the error no longer occurs, perhaps there is nothing to do. Are you using a different firmware to the one that generated the graphs above? If so an easy test would be to return to that firmware and see if the problem returns.

Thinkersbluff commented 2 years ago

an easy test would be to return to that firmware and see if the problem returns.

I understand. I have changed both hardware and firmware configurations, since last I reported this behaviour. Even going back to the previous firmware would not restore my printer to the actual condition it was in, when the above graphs were generated.

I originally posted my graphs and reports on the CR6 Community FIrmware GitHub, when I saw several CR6 Facebook community members chatting about the issue and blaming it on the CF. We later concluded that the bug was here, upstream of the CF fork, and there certainly do seem to have been a series of bugs over the years, with similar sounding issues.

I can see the temperature readings in the Octoprint Terminal "jumping" up and down by a couple of degrees at a time, from one sample to the next, which "dithering" I imagine is largely a matter of ADC noise overlaid on the "actual" thermal reading digitized with +/- one digit uncertainty at the ADC resolution. I do not understand enough of the system design to know how many significant figures the firmware really has to work with, but there must come a point beyond which it is futile to try to derive more accurate readings from the available data...

If your PR is designed to improve the ability of Marlin to cope with "noisy" data, is there any value to a log of thermal readings from the terminal on my system? (i.e. do you have a simulation setup to compare the data at the output of various points in Marlin, with and without your PR?)

tombrazier commented 2 years ago

This is a really fraught area with so many variables. Different sources of error can easily be conflated and what might be a software issue on one machine could look similar to a hardware error on another.

Some of what you describe sounds like #22893. After a long conversation, that issue resulted in two PRs from me which have just been merged, #23871 and #23867. Both could have an effect on the apparent quantization of ADC values. One activates 12 bit ADC (which was supposed to be the norm for 32 bit ARM processors but owing to a subtlety was not) and the other allows 16 times oversampling for when 12 bit ADC is used. On the other hand, maybe the behaviour you mention has a quite different source!

If I had someone who could replicate the 10 degree offset we might be able to work together to find its source.

My MPC PR should do better than PID with noisy data. However I do need real systems to test it against because I have already established that it does well against simulated hotends.

How does the CF work? Does it merge in upstream Marlin changes? Is it similar enough that upstream PRs could be merged easily?

Thinkersbluff commented 2 years ago

How does the CF work? Does it merge in upstream Marlin changes? Is it similar enough that upstream PRs could be merged easily?

I believe there is a PR being actively worked now, between @thinkyhead and @Sebazzz to merge the CF fork back into the mainstream. That was the goal of the CF project from the beginning. No idea how long it might take to complete that merge though… Although it may leave support for the stock TFT display on its own branch, most of the CF fork is still Marlin and there is an unreleased extui branch on the CF fork that @Sebazzz has been updating with Marlin PRs.

There may be other CF users who can still reproduce the problem. I believe at least one of the original Facebook “gang” commented on that issue and may still be monitoring it for updates. Maybe you can find a useful partner by recruiting on Community Firmware GitHub issue#248

Thinkersbluff commented 2 years ago

Some of what you describe sounds like #22893.

Indeed, I believe I also had that problem when I briefly experimented with 2.0.9.2. I did not have the problem described here, but I could not get Filament Load/Unload to work because the firmware kept waiting for the nozzle temperature to stabilize. In the end, I swapped the PT1000 out, and the problematic behaviour “disappeared”.

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.