MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.19k stars 19.22k forks source link

[BUG] Bad AutoTune, Poor Quality of PWM with known good PID #20463

Closed rflulling closed 3 years ago

rflulling commented 3 years ago

Bad Auto Tune and useless PWM/PID stability

I have taken the time to perform extended testing and gather an excessive amount of data tracing this issue back to just after Marlin Official 203. I hope this is useful to some one in tracing an identifying the cause. My fear is that this is really multiple issues, and this will end up being a royal pain for some one. E Below are the results of https://github.com/MarlinFirmware/Marlin/commit/8c0505395135bcffad426e5d557742838a9eceda and Official 2.0.7.2 and multiple roll backs all the way to 2.0.3.

Edit: 01/14/2021: Although there is evidence to support that there is more than one issue at work here. What I had not isolated at the time of generating these results was that this all started when I tried to enable PIDTEMPBED. Though it was mixed in with a few other features and tweaks so that it was anything new was over looked. As a result I am maybe a day or two away from posting revised notes to verify and validate all previous claims. Please note that further down I have built a second post with a very detailed break down with no less than 10 sets of results. Each folder contains a serial capture from each extruder, during Auto Tune, a warm up with the resulting auto tune values and an warm up using previous known good values. Also this section is about to get an over haul with updated notes and new folder with more notes in them as well.

https://github.com/MarlinFirmware/Marlin/commit/8c0505395135bcffad426e5d557742838a9eceda In version BugFix 2.0.7.2 (12/13/2020) Configuration 020008 https://github.com/MarlinFirmware/Marlin/commit/8c0505395135bcffad426e5d557742838a9eceda

a. Test AUTO TUNE: output cannot be used. b. Test with Auto tune generated PID from this version: FAIL c. Test with Previously Established known good value: FAIL

  1. With known good PID values, after warm up if the temp reaches peak, it will drop say from 200 to 185 before then ramping back up. There was no thermal event despite the massive fluctuation. Powered off E1. Allowed to cool under 100C, with fans running. This known good value was originally generated with a S245 just in case you want to know. ;M301 E1 P15.25 I0.82 D70.71
  2. Ran Auto Tune M303 E1 S240. Found and converted the output to command. Then restarted E1 at 240 After warm up if the temp reaches peak, it will drop say from 238 to 218 before then ramping back up. There was no thermal event despite the massive fluctuation. ;M301 E1 P7.146785 I0.203034 D62.891708

In version Official Release 2.0.7.2 (11/01/2020) Configuration 020007

a. Test AUTO TUNE: output cannot be used. b. Test with Auto tune generated PID from this version: FAIL c. Test with Previously Established known good value: FAIL

  1. after warm up if the temp reaches peak, it will drop say from 200 to 185 before then ramping back up. There will typically not be a thermal event under this condition, though it is useless to extrude.

  2. If I were to run M303 S1 the results would be drastically different from saved and typically produce a thermal event. *This is the first version I have tested that allows for two or more PID values in configuration. Auto Tune was used to validate changes to hardware and to reflect changes to PID in Marlin. It failed. ;M301 E0 P14.88 I0.87 D63.34 ; PID EXT0 KNOWN GOOD VALUE ;M301 E0 P10.44 I0.45 D60.98; PID EXT0 CORRUPTED PID

https://github.com/MarlinFirmware/Marlin/commit/3de1421d72b25003f253c76f3dc5a8d2d44accaa In version BugFix 2.0.5.3 (05/17/2020) Configuration 020006 https://github.com/MarlinFirmware/Marlin/commit/3de1421d72b25003f253c76f3dc5a8d2d44accaa

a. Test AUTO TUNE: output cannot be used. b. Test with Auto tune generated PID from this version: FAIL c. Test with Previously Established known good value: PASS

  1. after warm up the temp will go directly to the expected temp and over shoot slightly then drop 5C to 7C below the target and ramp up slowly, then stop at the temp and maintain solid without further deviation. Thermal Events during warm up will happen at random and did also in prior versions of Marlin. But it is not understood why. The Thermal events were nuisance but not show stopper. They were also assumed to be an end user user issue due to set up or configuration.

  2. If I were to run M303 S1 the results would be drastically different from saved and typically produce a thermal event. *This was not reported as a bug due to having loaded KNOWN good values as part of my slice files. M303 was not required in this time despite R&D on my extruders. Slice files contained entries for dual hot ends as this version of Marline did not yet support define 2 or more sets of PID values.

    • ;M301 E0 P14.88 I0.87 D63.34 ; PID EXT0 KNOWN GOOD VALUE
    • ;M301 E0 P10.44 I0.45 D60.98 ; PID EXT0 CORRUPTED PID

In version Official Release 2.0.5.3 (04/14/2020) Configuration 020005

a. Test AUTO TUNE: output cannot be used. b. Test with Auto tune generated PID from this version: FAIL c. Test with Previously Established known good value: FAIL

  1. after warm up the temp will get close to the peak then hover with a slight fluctuation just out side peak and within a min or two a thermal event will happen. I cannot start a file. -I am testing with this version for the point of this issue. However never used it due to one of the Dual Y axis refusing to run in the right direction. I switched to the Bug Fix version until 2.0.7.2

  2. If I were to run M303 S1 the results would be drastically different from saved and typically produce a thermal event. *This was not previously reported as a bug due to lack of use of this version.

    • ;M301 E0 P14.88 I0.87 D63.34 ; PID EXT0 KNOWN GOOD VALUE
    • ;M301 E0 P10.44 I0.45 D60.98 ; PID EXT0 CORRUPTED PID

https://github.com/MarlinFirmware/Marlin/commit/736521a3f1124a84a023e5ba1f030e09a95f16dc
In version Marlin BugFix 2.0.3 (03/01/2020) Configuration 020004

a. Test AUTO TUNE: PASS: output functional. b. Test with Auto tune generated PID from this version: PASS: slight fluctuation +-2c c. Test with Previously Established known good value: PASS: +-1c ;M301 E0 P14.88 I0.87 D63.34 ; PID EXT0 2020-06-07 KNOWN GOOD VALUES ;M301 E1 P15.25 I0.82 D70.71 ; PID EXT1 2020-06-07 KNOWN GOOD VALUES

  1. I am able to "Que" the Auto Tune Cycle again. (I had not noted when this walked off. Guess it was 2.0.5.3 that took it away. ) What I mean by this is that from the LCD, I can rapid set both heaters to 240, and walk off. Marlin manages the tuning and when I come back I only need to select store values from the menu. Though if the tuning is good, I will also back up the values to my slicer to insure against updates that wipe out a good tuning. Both just ran back to back, smooth, not one bump in the road. I noticed this could no longer be done, but not paid attention to when it happened.

  2. I ran auto tune and got the following ; M301 E0 P20.10 I1.34 D75.58 ; PID EXT0 2020-12-16 LOOKS PROMISING ; M301 E1 P19.66 I1.27 D76.27 ; PID EXT1 2020-12-16 LOOKS PROMISING

  3. After Warm up, E1 seems a little slower to catch up with generated values, but they hold and there are no thermal events. Official 2.0.3 from 03/01/2020 has a functional Auto tune!! Also supports previously generated PID!

Configuration Files

Required: Include a ZIP file containing Configuration.h and Configuration_adv.h.

Attached are the configuration files from all tested builds including the bug fix https://github.com/MarlinFirmware/Marlin/commit/8c0505395135bcffad426e5d557742838a9eceda.

Marlin BugFix 2072 Configuration 020008 https://github.com/MarlinFirmware/Marlin/commit/8c0505395135bcffad426e5d557742838a9eceda (12/13/2020)

BF2072.zip

Official Release 2072 (11/01/2020) Configuration 020007

AR2072.zip

Marlin BugFix 2053 Configuration 020006 https://github.com/MarlinFirmware/Marlin/commit/3de1421d72b25003f253c76f3dc5a8d2d44accaa (05/17/2020)

BF2053.zip

Official Release 2053 (04/14/2020) Configuration 020005

AR2053.zip

Marlin BugFix 2030 Configuration 020004 https://github.com/MarlinFirmware/Marlin/commit/736521a3f1124a84a023e5ba1f030e09a95f16dc (03/01/2020)

BF2030.zip

Steps to Reproduce

24V System, SKR PRO 1.1, 40W heater

1) Go to oldest known stable version of firmware where everything worked. In this case June 2020 or Configuration 0200060, I have named BugFix 2.0.5.3. https://github.com/MarlinFirmware/Marlin/commit/3de1421d72b25003f253c76f3dc5a8d2d44accaa EDIT: June is function with exceptions. -Oldest found is now Bug Fix 2.0.3 https://github.com/MarlinFirmware/Marlin/commit/736521a3f1124a84a023e5ba1f030e09a95f16dc as it is 100% functional in terms of Auto Tune and PID/PWM.

2) Save to SD card and load into SKR PRO 1.1 3) Reset SKR PRO to uptake Firmware.bin 4) From LCD initialize EEPROM. 5) From LCD restore defaults. 6) From PC connect to SKR PRO 1.1 over USB. 7) Run Auto Tune M303 C3 E1 S200 \n. 8) Grab values from Serial verbose. 9) write out M301 E1 Kp Ki Kd \n ;command string 10) Send PID command. 11) Store with M500. 12) Check with M503. 13) Send E0 command to set temp 200C. 14) Observe warmup. Capture graph. 15) Reset SKR PRO 1.1 16) each pass first pass with new firmware Start over at Line 6) load known good value M301 E1 P15.25 I0.82 D70.71 ; for line 7) then after completing both iterations, move on to next line. Repeat accordingly. 17) Repeat from 1) with Firmware.bin for Official 2.0.5.3 18) Repeat from 1) with Firmware.bin for Official 2.0.7.2 19) Repeat from 1) with Firmware.bin for December 2020 or Configuration 0200012, I have named BugFix 2.0.7.2

Expected behavior:

Run old PID values stored to EEPROM or loaded through gcode at file start, execute warm up and be stable, be ready to run. else Run M303, get current PID values, submit to marlin as string, save and have no thermal failure after warm up. Run Auto Tune from LCD, Run Store Settings from LCD, connect to printer from S3D, run warm up and start printing.

Actual behavior:

Old PID values are most stable, but not perfect. Each new version of the marlin the PID seem to become more unstable generating a slightly different behavior. Causes hardware to look like it is failing. Reverting firmware offers a temporary solution.

Marlin Official 2072 (12/13/2020) has a nearly 20c swing for extruder warm up, and generates strange PID values when using AutoTune. Where normal is K14 and abnormal is K7 or thermal failure..

Marlin BugFix https://github.com/MarlinFirmware/Marlin/commit/8c0505395135bcffad426e5d557742838a9eceda (11/01/2020) behaves exactly the same as Marlin Official 2.0.7.2

Marlin BugFix https://github.com/MarlinFirmware/Marlin/commit/3de1421d72b25003f253c76f3dc5a8d2d44accaa is stable if running known good PID, but AutoTune generates strange values.

Marlin Official 2053 (05/17/2020) is better but never gets to full temp even with known good PID, and generates strange PID values.

Marlin BugFix 2.0.3 (3/1/2020) passes the muster. This version is able to use old values, generate new values and loaded values are stable without causing thermal event or strange fluctuation. -Please note that I do not know what iteration of BugFix2.0x that is the source of what I refer to as BF203. That is because the files say that the it was edited and zipped at 03/01/2020 @ 10:05 CST. However the last entry on git is several hours prior or the following morning. So the closest is https://github.com/MarlinFirmware/Marlin/commit/736521a3f1124a84a023e5ba1f030e09a95f16dc

Additional Information

Installed HT-NTC100K sensor and documentation. 07/2019 Marlin thermal profiles 11 & 13 https://www.ebay.com/itm/3D-Printer-Parts-350-Degrees-HT-NTC100K-Thermistor-High-Temperature-Sensor/253718429751 https://www.keenovo.com/NTC-Thermistor-R-T-Table.pdf https://www.tme.eu/Document/f9d2f5e38227fc1c7d979e546ff51768/NTCM-100K-B3950.pdf Installed 24V Hot end. 07/2019 https://www.amazon.com/gp/product/B07NYPCBYF Installed 24V Power Supply. 12/2020 https://www.amazon.com/gp/product/B013ETVO12 Installed SKR PRO v1.1 Controller 10/2019 https://www.amazon.com/gp/product/B07T1RTN58

  1. Power supply replaced for this test with new. Listed above.

  2. Switched out saved previously loaded and stable Firmware.bin to test. Recompiled older builds to test, regardless of the saved Firmware.bin. Operated with known good values. Switched between known good versions of firmware and the known good PID to validate that extruders are not burned out. Sensors are working. Mosfets are stable. Checked archived builds Configuration files against current configuration files.

  3. Generated a full 32in video capture of running diff on Configuration files, then compile, then load and testing the firmware and the results according to tests laid out above. But video must be edited down before can be posted.

  4. Attached BF2053 (05/17/2020) image does not show a fail as much as it does a pass. This was captured to show a good warm up with that build and a known good PID value. Now if only that build could also generate a good PID value... EDIT also added from BF2030 (03/01/2020) showing a steady PID as sent back by Marlin with M155 S10.

Marlin BugFix 2.0.7.2 (Downloaded 12/13/2020) Configuration 020008 https://github.com/MarlinFirmware/Marlin/commit/8c0505395135bcffad426e5d557742838a9eceda ORIGINAL DATA

FAIL PWM BF2072

Official Release 2.0.7.2 (Downloaded 11/01/2020) Configuration 020007 ORIGINAL DATA FAIL PWM 2072

Marlin BugFix 2.0.5.3 (Downloaded 05/17/2020) Configuration 020006 https://github.com/MarlinFirmware/Marlin/commit/3de1421d72b25003f253c76f3dc5a8d2d44accaa

ORIGINAL DATA

FAIL PWM 2053 BF Official Release 2.0.5.3 (Downloaded 04/14/2020) Configuration 020005 ORIGINAL DATA

FAIL PWM 2053

Marlin BugFix 2.0.3.0 (Downloaded 03/01/202) Configuration 020004 ORIGINAL DATA Imperfect equal between Configurations and the bug was not yet found. BugFix2.0.3.0(736521a3f1124a84a023e5ba1f030e09a95f16dc) Full Setup // Everything needed enabled. // 01/17/2021 BugFix2.0.3.0(736521a3f1124a84a023e5ba1f030e09a95f16dc) Limited // uncomment #define PIDTEMPBED // not yet posted PASS PWM PID BF2030 -Please note that I do not know what iteration of BugFix2.0x that is the source of what I refer to as BF203. That is because the files say that the it was edited and zipped at 03/01/2020 @ 10:05 CST. However the last entry on git is several hours prior or the following morning. So the closest is https://github.com/MarlinFirmware/Marlin/commit/736521a3f1124a84a023e5ba1f030e09a95f16dc

EDIT: Videos are created at 32" full rate. Reduced to 480p and are still more than 73MB. If reduced to 10MB they will be useless for any kind of scientific viewing. Since GITHUB is not willing to host these files they are to be hosted from Google Drive. Links posted after 01/17/2021 now contain video samples.

thisiskeithb commented 3 years ago

Please download bugfix-2.0.x to test with the latest code and let us know if you're still having this issue.

rflulling commented 3 years ago

Please download bugfix-2.0.x to test with the latest code and let us know if you're still having this issue.

You are too quick to reply without reading. Been there done that. Tested several version of bug fix and it seems that lately I only operate on a version of bug fix.

  1. With known good PID values, after warm up if the temp reaches peak, it will drop say from 200 to 185 before then ramping back up. There was no thermal event despite the massive fluctuation. Powered off E1. Allowed to cool under 100C, with fans running. This known good value was originally generated with a S245 just in case you want to know. ;M301 E1 P15.25 I0.82 D70.71
  2. Ran Auto Tune M303 E1 S240. Found and converted the output to command. Then restarted E1 at 240 After warm up if the temp reaches peak, it will drop say from 238 to 218 before then ramping back up. There was no thermal event despite the massive fluctuation. ;M301 E1 P7.146785 I0.203034 D62.891708
rflulling commented 3 years ago

For those who are only looking to skim and not read.

Bug Fix 2053 Configuration 020006 was the last good build for me. It ran everything fine. I did not realize the Auto Tune was not working. I had been working with a known good value set saved into my Slicer Script that was sent fresh every time a fresh file was loaded. However, what I have since found out is that this and every version since is unable to AutoTune to anything functional. Later versions cant even hold a stable temp with a good PID value. Each later version that I have tasted after this point displays increasing instability with identical hot end configuration. The behavior is consistent and replicatable, just by switching firmware. The behavior is replicatable even after recompiling firmware for that archive version.

I have not yet determined how far back I will need to test to be able to run Auto Tune without issue. TBA/TBD.

thisiskeithb commented 3 years ago

You are too quick to reply without reading. Been there done that. Tested several version of bug fix and it seems that lately I only operate on a version of bug fix.

For those who are only looking to skim and not read.

I did read your post. There is no “BugFix 2.0.7.2”.

2.0.7.2 was released Oct. 29 and is just a static copy of bugfix, which is why I asked you to test with bugfix-2.0.x as that is updated daily.

Before your next reply, I’d suggest you re-read and understand Marlin's Code of Conduct. By filing an Issue, you are expected to comply with it, including treating everyone with respect: https://github.com/MarlinFirmware/Marlin/blob/master/.github/code_of_conduct.md

descipher commented 3 years ago

What voltage are you running on your heating elements? I could run a validation of your observation on bugfix-2.0.x to clear things up.

rflulling commented 3 years ago

@descipher

What voltage are you running on your heating elements? I could run a validation of your observation on bugfix-2.0.x to clear things up.

Good question. I am running a strict 24V system. I forgot to add this to the report and although it is mentioned in the product links. I will also add it to the report in case it is useful.

You are welcome to test the Config. Though I suspect this may be some what specific to hardware, as I do not see other complaints. So I do expect skepticism. I will answer any questions that I can.

rflulling commented 3 years ago

@thisiskeithb

I did read your post. There is no “BugFix 2.0.7.2”.

How do name the bug fix versions? I regard them by date, by configuration "020006" and the most recent Official Version (Alpha Release. ) Simply calling it 2.0.x is ambiguous and offers up no idea as to when it was last modified. How can I be more helpful in naming.

descipher commented 3 years ago

24V will amplify any issue. I don't think this would be too centric to hardware other than the rate in which the element could be heated. The element is also a variable e.g. wattage.

rflulling commented 3 years ago

in a version of BugFix from June, released after 2.0.5.3, there was no swing. Temp established was stable. Seems that the BugFix lacked some part of the issue that 2.0.5.3 had but was not reported as an issue. After updating to 2.0.7.2, it would not stabilize. Now The temp swings low as if the power is fully switched off and for way to long.

Perhaps what I am seeing are really two bugs. Perhaps I have incorrectly named them as one. Assuming the cause was one and the same.

descipher commented 3 years ago

There are 2 basic branches, the release branch 2.0.x which is what you would download normally and the bugfix branch which you would have to specify in a git clone command.

e.g. git clone --branch branchname remote-repo-url which in this case is bugfix-2.0.x

descipher commented 3 years ago

Is this a custom build?

rflulling commented 3 years ago

I am aware of the branches and GIT structure. The issue is ambiguous naming. If everything is called BugFix 2.0x and there are 365 days in the year and it it altered maybe several times a day, how do I refer to any one point so that the Devs can know what was changed.

descipher commented 3 years ago

I'm to new at this repo to guide you there, however I can help validate whether your observation is occurring on my test bed.

rflulling commented 3 years ago

Is this a custom build?

It is simplest to say yes.

Configuration is based on Foldgertech Ft-6. However his build shipped with A4980 drivers and MKS Gen 1.4 running Marlin 1.1.9. I have since upgraded to SKR PRO 1.1, ST820 drivers, TMC5160 drivers, etc. extruders are full custom.

A better answer is to say you will not find a functional copy of the Configuration files for this build in the samples.

descipher commented 3 years ago

I'm more concerned on a unique heating element, looks like it's common.

rflulling commented 3 years ago

I have been working with Marlin for the last 3 on 3 boards years at least, and before that a Duet Meastro. I am familiar with most of the ins and outs of the Firmware, to get it to do what I need, most of the time, and to help others. I have drafted help docs to help others get successful builds. Another user, used some of my work as a primer to write a 250 page book on configuring the SKR PRO...

But when it comes to PID or Auto Tune. I might as well be trying to dismantle an Automotive Transmission. I can tear down the engine blind folded in the dark, really. But the Transmission, I have to take it to the shop.

I am hopeful some one will note something I over looked. Or that this helps to identify an actual bug.

thisiskeithb commented 3 years ago

How do name the bug fix versions?

It's just bugfix-2.0.x, but referencing the specific commit would make it clear what you downloaded & when. Adding the tagged version number is confusing since that implies you are using an outdated/stale version and not the latest bugfix-2.0.x version.

Example: bugfix-2.0.x @ 1be16e3d8c44965f2e3f58b2845f9ac8ecdbc74b or simply 1be16e3d8c44965f2e3f58b2845f9ac8ecdbc74b.

descipher commented 3 years ago

ATMEGA2560 RAMPS 1.4 Standard Early Prusa hotend.

define DEFAULT_Kp 22.20

define DEFAULT_Ki 1.08

define DEFAULT_Kd 114.00

M303 E0 S180 Did one complete test on E0 (Single Extruder @ 12V 33W) target 180C PID calc = P20.80 I1.19 D90.70

M104 E0 S180 This resulted in a reheat temp ripple of 190,176,181,179,180,180,180. Not great but works.

Repeated same test @ 20V target 180C

00:55:43.915 > T:154.39 /0.00 @:127 00:55:45.931 > T:163.59 /0.00 @:127 00:55:47.946 > T:174.59 /0.00 @:127 00:55:49.963 > T:187.58 /0.00 @:0 00:55:51.978 > T:199.38 /0.00 @:0 00:55:53.996 > T:209.84 /0.00 @:0 00:55:54.150 > PID Autotune failed! Temperature too high

Massive overshoot.

define HEATER_0_MAXTEMP 275

Looks like there is a reasonable probability of a bug which is more apparent at faster heating rates. Suspect a scaling issue or possibly a low data sampling rate issue.

rflulling commented 3 years ago

@thisiskeithb This seems to be the one. No way to really be sure. Windows does not see 8c05053 included in any content. BugFix 2.0.7.2 (configuration 020008) 12/14/2020 12:51AM CST This seems to be the target. 8c05053 Windows does not see 8c05053 included in any content. All other commits on Dec 14th took place after the download.

Regardless the BugFix 8c05053 presents the same issue.

Going back in time: BugFix 2.0.5.3 seems to be 3de1421 However I cannot be certain as GIT does not retain outward visible TIME data on the entries past 24hrs.

Suggestion: A commit needs to be written to one of the files so that if over looked, you, or any other Developer can ask for it, or we can be required to INCLUDE said file with the Configuration files.

Suggestion: Include instructions on how to find and include the commit of the bugfix downloaded with a bug report.

rflulling commented 3 years ago

@descipher

M104 E0 S180 This resulted in a reheat temp ripple of 190,176,181,179,180,180,180. Not great but works.

Earlier versions, this kind of fluctuation would have caused a Thermal Event. I had to add extra time to stabilize to avoid the screaming in in the middle of the night. I work, others sleep. However once to temp there was hardly any ripple. Now there is 10-20C of swing.

Thank You. I am glad you were able to confirm that something is happening that is not quite normal. This temp swing.

descipher commented 3 years ago

Based on the current observations and a brief walk of the PID code I have to assert that system does not account for heater temperature rise times. I do not see any initial input parameters for voltage or heater wattage, thus the initial PID values must be set manually or tuned for any target hardware builds. More analysis is required but I would say the current initial default values may need to be less aggressive to work for a wider generic base. A MACRO could simplify tuning PID defaults by collecting target heating element specifications. The current state will most definitely not work correctly OOB on 24V systems. Even using a 12V 40W heater verses a 12V 30W heater results in highly variable PID values which the user needs to initially tune. In most cases the required expertise would consume significate time investment for an end user.

swilkens commented 3 years ago

On 2.0.7.2 release build.

Current settings

Overshoots by 1C, drops by 1C, oscillates a few times by +/- 1 C and then stabilizes at 190 C.

echo:; PID settings: echo: M301 P21.73 I1.54 D76.55

Result of tune (M303 C5 E0 S190)

Overshoots by 2C, drops by 4C, oscillates longer but does eventually stabilize at 190C.

Kp: 22.95 Ki: 1.85 Kd: 71.30

It's ballpark all right. Certainly not as stable as my original values though.

descipher commented 3 years ago

I have validated that the PID calculation code has not changed as far back as one year. What I did notice is that there are specific conditions that can negatively impact the results.

1) Extruder EXTRUDE_MINTEMP limit of 170C is not a good value for PID. If you are tuning to 180C it will always overshoot. This is because it leaves only 10 degrees for PID calculation range based on the min/max range.

2) The PID PID_FUNCTIONAL_RANGE defaults to 10, this default should be higher, especially for fast hot end heaters.

3) The smoothing param PID_K1 has significant impact to overshoot and initial stability, still poking at it so no recommendation.

descipher commented 3 years ago

PID_K1 results, The smoothing factor is highest as the value approaches 0.5 if you prefer to see a high level of smoothing try around 0.55 to 0.45. You can expect a slightly slower ramp up to the setpoint but a high level of stability. My best results are with PID_K1 0.52 PID_FUNCTIONAL_RANGE 16 EXTRUDE_MINTEMP 160 30W 12V heater, ATMEGA2560 RAMPS 1.4, 100k Thermistor

Seeing +1.5 -1.5 on the startup ripple, with stable follow on +- 0.20 fluctuations via PID_DEBUG output.

rflulling commented 3 years ago

How may I assist if at all? May I provide additional details or tests that may help?

Would you like Video of the Temperature plot over each of the builds as apposed to screen shots?

Obviously my self interest is to be able to operate on something more recent than https://github.com/MarlinFirmware/Marlin/commit/736521a3f1124a84a023e5ba1f030e09a95f16dc But also that this may aid the greater community, I am committed as well.

As noted in the OP... after several edits. https://github.com/MarlinFirmware/Marlin/commit/736521a3f1124a84a023e5ba1f030e09a95f16dc is how far back I had to go to find a version that generated stable Auto Tune results that could be used without Thermal Fault and could also use previously generated PID. Later versions could use older PID but lost the ability to generate PID with Auto Tube that would be functional. Each successive version the results become more unstable, eventually resulting in the near 20C swing seen on both 2.0.7.2 and https://github.com/MarlinFirmware/Marlin/commit/8c0505395135bcffad426e5d557742838a9eceda as the most recent BF test point.

I suppose it is also worth note: The parts in use and selected for this build are with plans to eventually operate in a range from 200C to 320C and possibly higher as permitted by the sensor itself. Worst case the Thermal Resistor will be switched out for a Thermal Transistor sensor. That I have been operating in the 180-245 range is mostly a consequence of the materials I use most while spending endless hours debugging hardware behavior. Just now finally able to reach extrusion over 12mm/s, and of course a motor is malfunctioning... Once the extruders are behaving I expect to enclose the machine and start working with higher temp materials. Thus 24V 40W and an operational range of 180-320c.

rflulling commented 3 years ago

I am seeing reports of others having PID issues. But I am not sure if any of it is the same issue or something else.

Seems some one took to reddit to claim that the BigTree Tech is to blame and that their boards need to be altered. "Why you should not buy Bigtreetech boards // Why your thermistors are not working correctly"

I don't know if this is all boards or just certain models. These issues are not being reported in Marlin groups. They are being posted to BigTreeTech and other related boards groups.

thisiskeithb commented 3 years ago

Seems some one took to reddit to claim that the BigTree Tech is to blame and that their boards need to be altered. "Why you should not buy Bigtreetech boards // Why your thermistors are not working correctly"

That post was a bit sensational and is not common to all BigTreeTech boards, but BigTreeTech has already fixed all new SKR E3 Turbo boards manufactured after the end of last month, so that is likely unrelated to your issue.

They replaced the inductors with 0 ohm resistors:

BigTreeTech's comment on Kersey Fabrications' E3 Turbo post:

I've asked BigTreeTech about the SKR Mini E3 1.2, but haven't heard back yet.

descipher commented 3 years ago

Not likely to be related. The writer has not ruled out other sources of noise for that issue and is simply wrong. It boggles my mind when I see posts like that. Such a claim must be clearly demonstrated as correct with data that represents the claim. Consider this, the 22uH inductor is a low pass filter and it will also have current saturation fluctuations, this is not a hardware fault. Its designed that way. If you use the wrong thermistor e.g. 4k range it will not be stable. You need to use a high value one such as a 100k thermistor.

rflulling commented 3 years ago

While going over the configuration files with a fine tooth comb for any deviations. Making all to sync with bf2.0.3 I also set out to find deviations in underlying code directly related to PWM, PID or Temperature.

Using WinMerge I am able to view 3 files line by line at the same time performing manual difference. I do not claim to understand what I have found. I do not claim to know it is relevant or not.

What I have found is between the three defined BugFix points, there was a solid amount of rework done: Temperature.h /Temperature.cpp, as is also reflected in Configuration.h as well as in many other related files.

Question: what is NAN and why is the value 3000.0f replaced with NAN?

Thermisters.h also has has had some work done: ADC resolution changed from "1024" to "10", it looks like to compensate for some code math, a multiple "_BV" I did not find what _BV was alias for. Assuming it is a variable or function that is generating a value of 102.4?

Testing Thermisters: Reading 24C at 75F room temp on profile 13, which is a 4092K versus 3950K. -However before any one gets excited, these thermisters are potted in thermal material, in a metal can, so response may actually be slightly faster than normal plus there is no solder to melt at high temps, the wires are spot welded I believe. The very least it's range is greater than normal 3950K. I have looked over the the files and I am not even sure the links provided by the vendors explain this thermister. But profile 13 has been stable otherwise and 11 seems unrelated other than the b value. Also this does not explain the progressive changes in the test results over several versions. Because although the files were edited, it does not seem obvious that the edits altered their function. -I claim no expertise in this market. All this transmission talk...

Since I cannot find smoking gun I will generate video. Would be nice to discover in the course of shooting the videos that there was an error in the configurations files. Though it would be a whopper to generate degenerative results over time. So next post will have three samples. Each is a Bugfix named above as they are effectively my operational systems... Also despite the posted configurations being fully functional otherwise. I have dialed them all back to be as close to the configuration of the OLDEST most stable version. Thus if it's not enabled in that version, it wont be for testing in any other.

Each video will follow: Compile, Install, Connection, Verify Version, EEPROM reset, Auto Tune, Store configuration, Test, over write with previous known good values, Store Configuration, Test. Video will show full warm up and cool down cycle. Logs of Serial Echo will be posted too.

rflulling commented 3 years ago

Update:

In the course of generating new more detailed videos for each revision and having ultra filtered each configuration file to match that of the last known good version. The behavior captured previously on video not posted, based on the initial post has, been subdued. The fluctuation eliminated in all but one instance, which is present tested from Official 2.0.7.2 on forward. Thusly, while none have directly accused me personally of having caused this bug, it is safe to argue now that the source is in the configuration files and thus the onus is on the shoulders of the user, my shoulders, to identify the source of the both bugs. The ones eliminated, and the one still being observed.

Just in case I will also test and recompile with those older configuration files test against the chance that the issue has been caused by a dependency now since updated.

If the final cause is entirely user error which has managed to escape observation. I will not waste any ones time with further logs and videos. I will post what the cause was and close the ticket. If however it is something greater and the videos are needed to help validate and identify the issue then they will be posted, but only if this case is true.

Test 1: Auto Tune: PASS, Test 2: Generated PID: PASS, Tests 3: Legacy PID values: FAIL

;M301 E0 P08.335569 I00.333957 D52.013954 ; PID EXT0 2021-01-03 22:30//bf2072 <-- ;M301 E1 P07.049786 I00.242761 D51.181446 ; PID EXT1 2021-01-03 22:30//bf2072 <--

Test 1: Auto Tune: PASS, Test 2: Generated PID: PASS, Tests 3: Legacy PID values: PASS ;M301 E0 P12.010000 I00.650000 D55.500000 ; PID EXT0 2021-01-03 18:10//bf2053 ;M301 E1 P13.230000 I00.670000 D65.080000 ; PID EXT1 2021-01-03 18:10//bf2053

Test 1: Auto Tune: PASS, Test 2: Generated PID: PASS, Tests 3: Legacy PID values: PASS ;M301 E0 P12.490000 I00.690000 D56.730000 ; PID EXT0 2021-01-03 16:10//bf2030 ;M301 E1 P13.480000 I00.710000 D63.880000 ; PID EXT1 2021-01-03 16:10//bf2030

Test 1: Auto Tune: PASS, Test 2: Generated PID: PASS, Tests 3: Legacy PID values: PASS ;M301 E0 P12.420000 I00.680000 D57.150000 ; PID EXT0 2021-01-03 12:45//bf2030 ;M301 E1 P13.550000 I00.690000 D66.130000 ; PID EXT1 2021-01-03 12:45//bf2030

Legacy Stable PID that have served well in many versions of Marlin. Likely generated the day I installed the new Thermisters, Heater Cores, Controller Board, and Brass Heat Blocks. ;M301 E0 P14.880000 I00.870000 D63.340000 ; PID EXT0 2020-06-07//LEGACY ;M301 E1 P15.250000 I00.820000 D70.710000 ; PID EXT1 2020-06-07//LEGACY

rflulling commented 3 years ago
  1. If possible I would like a Minimalist copy of the Configuration documents. One that has the fewest possible entries. There are some after all that can never be commented out they will cause a compile fail.

  2. Efforts to Filter and eliminate all suspects have not revealed a culprit. I must now eliminate everything and start turning lights on one at a time. Until I have confirmed no cross function errors.

  3. I have attempted to use the PIO project inspect. Now behaves as if code was added to Marlin to prevent the tool from producing results. If that is even possible.

  4. Observed distinct differences in Auto Tune as of Official 2.0.7.2, never mind the stability issues. I am sorry, there is no way the underlying code is really identical. Less there were SKR PRO specific changes made that would effect Auto Tune?

descipher commented 3 years ago

The code is not "Identical" however the PID mathematical function methods are exactly the same. There are features which can be enabled now that are certainly different however the configs you have sent up do not use those features thus the conditional do not include them at compile time. e.g. extruder fan cooling and material input cooling factors. Also there is now an indexed array of "heating elements and sensors" and Autotune parms are loaded from EEPROM if enabled or the CFG file.

The challenge here is how to divide and concur the significant number of variables between the point at which it went south for you.

I suspect the issue is in the most recent configuration defaults. It should be possible to overlay your older working configuration values into the most current level without breakage, I will look at it this week.

I do not have the STM32 to test with so I cannot mirror the compile you have. There are elements that are different that I cannot verify. e.g. ADC resolution, sampling capability and compute speed. But if it works with the lesser ATMEGA2560 then its a narrow band to isolate.

rflulling commented 3 years ago

I suspect the issue is in the most recent configuration defaults. It should be possible to overlay your older working configuration values into the most current level without breakage...

This is par for the course. Every update is a new folder. Every folder is compared to the previous. Once in a blue moon I miss something. I always recommend to other users, never ever let any tool auto merge your configurations. When an update happens, without a script to define what is really expired, who knows what may be over written.

Since working on this issue. I have checked and rechecked more than 6 configurations. Looking for any little thing that might have been over looked. I have also looked into the code hoping to see something. I have seen stuff but nothing that looks like a smoking gun, other than a few locations were variables and multiples were substituted for functions. I think these could be connected to the issue, but there is little if any way way for me to prove without knowing when they were merged into the nightly build.

I do not have the STM32 to test with so I cannot mirror the compile you have. There are elements that are different that I cannot verify. e.g. ADC resolution, sampling capability and compute speed. But if it works with the lesser ATMEGA2560 then its a narrow band to isolate.

So I thank you sincerely for lending a hand here. However I do not not expect an 8bit card to act the same. We are talking Casio calculator versus any Laptop. The ATmega is 16Mhz 8bit. The STM is 168mhz 32bit. I just looked and do have my old MKS Gen 1.4, and that runs on ATMEGA2560. Been well over two years since I compiled anything for that card.

My understanding PWM acts very different in 32 bit chips, and I think Marlin team wrote some custom work on the STM series chips to get each one working just right.

rflulling commented 3 years ago

Traced issue to configuration files. Continuing the hunt until I can identify the exact cause. I hope to provide greater context for those with a better understanding of the code.

Downloaded https://github.com/MarlinFirmware/Marlin/commit/55d1938977e310648602e1b23e9e22e8fd6838b5 to see if there may be a unseen fix implemented behind the scenes since I started this ticket. I think it should be obvious the answer is no, it was not fixed. However I used the process to create a cleaner than normal more generic config set up.

The results showed me several things. One, despite my having two hot ends, something not enabled caused M115 to show the same PID values for both extruders. Two like magic Auto Tune worked and legacy PID values worked. Edit: -identified dual independent extruders pid and is one of the functions being disabled again for this test.

I created a second copy of the same folder performed a clean function to wipe out any previously compiled data. Performed an update and intellisense update. Ran merge of the Configuration files I generated with Official 2.0.7.2. Ran Auto Tune, and it took way longer than normal. E0 looked like previous attempts and E1 was something new, now when P should be something like 13, and the bad builds generate an 8, this new one generated a 4, and that was after the second attempt with the first generating an over temp error. An interesting development is that the Legacy PID did work. Despite Auto Tune being unable to create functional values worth anything. In other builds it was reversed.

So now looking very close at the the twin sets of configuration files. Knowing that the issue, may be found in their differences, when all other conditions are equal. At the very least, I avoided configuring anything that I knew would alter PID behavior. Such as delays in warm up to compensate for slow heating. Both versions of the files are set the same.

Step1: Rolling back Three locations where there are differences in temp related function, such as the expanded Sensors, and the Dual Configuration for PID in Marlin Config. Step2: Roll back anything fan related, anything that might use the PWM part of the code. Step3: undetermined. I hope to find this in steps 1 or 2 and only need to focus on those functions.

rflulling commented 3 years ago

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)FULL.zip Caught at Step1 named previous comment. Nailing it to the wall! Below.

Define Test executed at each step: Test1: M502|M500|M303 C3 E0 S200| Capture serial log // PASS/FAIL Test1: M502|M500|M303 C3 E1 S200| Capture serial log // PASS/FAIL Test2: M301 E0 (generated in test 1 for E0) | Heat, hold for 30s and cooldown | Capture serial log // PASS/FAIL Test2: M301 E1 (generated in test 1 for E1) | Heat, hold for 30s and cooldown | Capture serial log // PASS/FAIL Test3: M502 | revert to default known good values | Heat, hold for 30s and cooldown | Capture serial log // PASS/FAIL

Define Step: With each step, I am enabling, or disabling a function, and possibly changing values. Changes are cumulative so 1F will contain all changes tested before it, unless an issue is found. Issues will be omitted from the cumulative sequence for later testing as a separate process. Step also represents a compile, and firmware upload for process of elimination.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)BASIC.zip BASIC: State were the configuration is base bones, just enough to test with. -does not exhibit the bug as described.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)FULL.zip FULL: Configuration is fully merged as I would any other build. -Bug is observed. Auto tune generates random values and sometimes fails. PID when given good values, never stabilizes.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)STEP1a.zip Step1A: disable // Custom Thermistor 1000 parameters // PASS AutoTune 01/07/2021 -I don't use these, there is no reason to keep any of it enabled. -No issues found. Test1-3.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)STEP1b.zip Step1B: enable //#define PID_PARAMS_PER_HOTEND // PASS AutoTune 01/08/2021 -No issues found. Test1-3.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)TEST1c.zip Step1C: enable //#define PIDTEMPBED <<<-----Found One 01/08/2021 -Test1: T0/E0/H1: M303 C3 E0 S200: no issues noticed. -Test1: T1/E1/H2: M303 C3 E1 S200: Auto Tune on : Acting like a heated bed during auto tune. -Why is PIDTEMPBED effecting Extruder PID? -Behavior: sits much too low during initial testing rather than the nice wave it should be forming, then over shoots well over 238 when testing at 200C. -Result: "PID AutoTune Failed Temperature too high" -Wires have dedicated sockets, and are not crossed. Extruders and Heater cores work correct as long as PIDTEMPBED is commented out. -Ran AutoTune 3 times. Each time following these steps M997|M502|M500|M303 S3 E1 S200. Each time getting a different result. Kp 14, Kp7, and Temperature too high. This is weird. -Test2: SKIPPED. Since AutoTune output unstable. Testing Legacy PID, expecting pass. Logging data for comparison, then continuing. -Test3: Using Legacy good PID. T1/E1/H2: HALT! Thermal Run away. However both Extruder1 and Extruder2 were perfect twins, reading the exact same temperature.

with fail at Step1C -rolling back configuration to 1B and continuing with 1D to verify against any other gremlins. -PIDTEMPBED will be excluded from further testing. -However the sub functions under PIDTEMPBED will be tested at a time.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)TEST1d.zip Step1D: enable //#define PID_EDIT_MENU // PASS AutoTune 01/09/2021 Step1D: enable //#define PID_AUTOTUNE_MENU // PASS AutoTune 01/09/2021 -resulted in very clean, very twin results. Never seen before.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)STEP1e.zip Step1E: set to 20 from 4 //#define THERMAL_PROTECTION_HYSTERESIS 20 // PASS AutoTune 01/10/2021

Step2: break down nested functions of //#if ENABLED(PIDTEMPBED) -Here I will work through the optional defines and parameters to see if one of them is related to the issue, or if the issue lands fully on the base function itself. If making changes to and nested defines does not resolve this then Step3 will be to check through existing enabled functions for conflict. Starting with PID/PWM related. Changes here are not cumulative. They are being switched on and off one at a time. Reverted at the end of each step for the point of comparing results of switching sub functions to check for relation against observed bad behavior.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)STEP2a.zip Step2a: disable //#define MAX_BED_POWER 255 --> 1 // FAIL Legacy PID 01/10/2021 -step1: AutoTune // Pass, Step2: Generated Test // PASS, Step3: Legacy PID // FAIL -Legacy PID on Extruder 2 does not stabilize.

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)STEP2b.zip Step2b: enable //#define SLOW_PWM_HEATERS // FAIL Auto Tune 01/10/2021 -With this function enabled, temp seems to over shoot, and spends more time above temp than below. K value generated is 8 versus an expected 12-14. (Sorry test1 logs only contain one pass of data per extruder. ) -E0 and E1 when tested several times, each produced a pass or fail about 50% of the time. K8 being a fail. K12-14 being a pass. (Sorry test1 logs only contain one pass of data per extruder. ) -I have not yet see an over temp error. -Legacy PID is functional but seem to be having a harder time that usual stabilizing. I can hear the fans change pitch as the PWM engages. The temp fluctuates +/- 2C. Though this is not as bad as other observations. I know what Stable looks like on Marlin and there is no fluctuation normally. When everything is working, it is rock solid. (See Tests3 logs.)

BugFix2.0.7.2(55d1938977e310648602e1b23e9e22e8fd6838b5)STEP2c.zip Step2c: disable //#define PID_FUNCTIONAL_RANGE 10 // FAIL Auto Tune 01/10/2021 -Running E0 and E1. E0 has a slight deviation and E1 a greater deviation. (See Test3 Logs.) -overall behavior inline with previous observations.


I highly recommend that any one looking over my notes, decompress the folders to one common location and use WinMerge to compare the files contents between edit points.

Within each log folder you should find the following:


Worst case scenario, //#define PIDTEMPBED, will remain commented out in my files. There is only so much time I can spare to the chasing of this bug.

rflulling commented 3 years ago

Side notes:

Something I have noticed but never filled a bug ticket. I never enable Volumetric. Yet it is always available in the LCD menu to be enabled anyway. Why not all of the functions like this? We dont need them, but they are there anyway. though if they aren't enabled in the configuration, should they not be fully disabled? Might be nice to have a check list. Where a function can be configured and is off by default but can be turned on if desired later. I think this is how Duet works, where its all on regardless. But must consume much more over head.

Also that M503 and M115 have such different report structures. M503 should be run without verbose enabled or the user gets a mess. M115 requires verbose or nothing shows up. I think i need to edit these functions in my own copies and change how they report so its uniform...

rflulling commented 3 years ago

@descipher found one, looking for more

descipher commented 3 years ago

@rflulling what was your result with SLOW_PWM_HEATERS enabled and disabled. FYI if you disable PID_FUNCTIONAL_RANGE it defaults to 10 anyway.

I don't like what I see in the autotune MENU functions. Will dig further.

rflulling commented 3 years ago

what was your result with SLOW_PWM_HEATERS enabled and disabled.

Just about to start into Step2 processes at the bottom of the comment, 3 back. I have been updating the details as I go along adding the date and time processed. Wrapping up Step1E now.

Step1C so far represents the only combination where the system malfunctions and having PIDTEMPBED commented out, all other testing is PASSING my tests tests so far. Edited: 01/14/2021 to add results of the tests. Step1C represents PIDTEMPBED enabled SLOW_PWM_HEATERS enabled. // FAIL 01/08/2021 -Thermal runaway on extruder 2 while testing with known good PID values. Step2B represents PIDTEMPBED enabled SLOW_PWM_HEATERS disabled. // FAIL 0/10/2021 -While the test was a pass over all, it failed due to the 2C wiggle.

I don't like what I see in the autotune MENU functions. Will dig further.

The auto Tune menu has not caused me issues before. Have had it enabled for most of the time I have been running Marlin 2 or later on the SKR PRO 1.1

rflulling commented 3 years ago

What seems to be the Implication is that:

When PID functions were merged, simplified and integrated. From my perspective, they all now run on the same code. And given the Matrix Structure used to support multiple hot ends. Some how now the new code causes conflicts, or will randomly execute a component from BED functions when Heating or Tuning Extruders.

The annoying part is that this seems to happen at random or is related to resets. I have not yet nailed down.

I can confirm that the issue is Observable when PIDTEMPBED is enabled. I am looking for other circumstances. But although some behavior is graphically observable through monitoring temp plots. It is clear there is another force at work since sometimes an Auto Tune will be spot on and other-times, fail or produce unexpected results.

While most of my testing has seen results in a pass fail. I am sickened to notice that in some tests the results pass, but a retest is a fail. Potentially bringing into questioning more than a weeks worth of effort when I thought it had been nailed down. So I will have to retest known good points for fault several more times to confirm this issue really is ONLY happening when I think it is. Else one could argue unstable hardware, which I have otherwise seen no other indication of. But for the scientific method it must happen. I suppose the next step is to go back to the original 3 commit points I have been working with and compare results.

My fear is that the EEPROM CHIP or whatever part of the MCU is storing data, will give out after being written to so many times. Given the way the SKR PRO copies it's firmware each time it is updated. I have yet to see low end storage chips that were happy being written to over and over again like this.

I feel that the greater Marlin Dev community is of the opinion that there is no bug, that action is not needed. It would also not be the first time I have had trouble proving a bug exists. If I recall it took some time to gain any support on a Z leveling issue as well. Where I could not disable a function that required leveling that was not using.

I am working hard to prove the issue, but it is exhausting to be an island. If I have to give up I hope all I have provided is useful to some one. Thank you to @descipher for his efforts.

qwewer0 commented 3 years ago
  1. -Oldest found is now Bug Fix 2.0.3 736521a as it is 100% functional in terms of Auto Tune and PID/PWM.

@rflulling Which one is correct, 2.0.3 or 736521a? Because 736521a is between 2.0.4.4 and 2.0.5.

7. Run Auto Tune M303 C3 E1 S200 \n.

Is there any particular reason why to test it with only 3 cycle?

rflulling commented 3 years ago

Thank you for these questions.

Which one is correct, 2.0.3 or 736521a? Because 736521a is between 2.0.4.4 and 2.0.5.

I will look again after work to confirm. Since this ticket I have started configuring the Version.h with data on the COMMIT point to insure that such information is not lost again. However I have previously added a date to each set of file when downloaded and am checking against that date as well as the date those files were created remotely, if it exists. From that I looked to determine when the folder was downloaded from GIT and what commit point it should have been. I am sorry if this does not line up. I will look again to make sure this is accurate as possible.

As it is, based on data generated on the comment I have been editing nightly from 4 days ago, I will need to go back and recheck the commit points in the OP, and revise those note accordingly. As they are outdated now.

Note there are dates included with each specified point above. These should match to the dates listed in my files as when they were downloaded or created remote prior to download.

Is there any particular reason why to test it with only 3 cycle? I copied this from a sample at some point or an echo from serial when running from LCD on screen AUTO config. I actually do not remember. Where. I have used it as a default since.

Should it run longer?

qwewer0 commented 3 years ago

Should it run longer?

What I heard is, that more than 5-7 won't improve on the result (I can't back it up), so implying from that, less than 5 maybe could produce "not good enough" results? Could you maybe test 3, 5 and 15 cycles on the version that works, to see if it produces noticeably different results? But one comment that I can add (unless someone can prove it wrong) is that the starting temperature for the PID autotune might alter the results, e.g. E-1 C7 S50, started from:

Not sure if those differences are big or small. And it could be that my case is connected to https://github.com/MarlinFirmware/Marlin/issues/20463#issuecomment-751392938 , as I did my testing on an SKR Mini E3 v1.2, but there has been no problem in the past.

rflulling commented 3 years ago

@qwewer0 I see your point. I cannot say if it is specific to the SKR PRO 1.1 or general to a larger group. I have my theory that its connected to the PID matrix code. I have been so far only able to confirm that the issue happens when PIDTEMPBED is enabled and does not happen when it is not. I am still working out if this is the fault of a sub function, but it does not seem to be so. The next step is to look back at steps that seemed to work right and check for fault. If they pass, I will look back at the older versions of the code as listed in the OP and see if switching PIDTEMPBED dependably switches the bug on and off as well.

Initial testing showed that as versions progressed the visible behavior worsened. Though I was not sure the cause.

More recent testing shows that the current version will produce inconsistent results, in AutoTune. One time being perfect, another slightly off, another being very off, even possible failing completely by going OverTemp too long.

If it were simply a case of letting AutoTune bake longer then I would not see values so all over the map. I would not see known good values also randomly misbehave in unexpected ways. Infact as Auto Tune runs, subtle changes between the instances should average out to produce a more stable code. This makes sense. But that wont fix this issue.

I have also seen at least one other complaining of a nearly identical issue. I advised the user to disable the PIDTEMPBED function and see his Extruders work Properly after. We can live without this function as BangBang will heat the bed up just fine. But without the calibration of PID, I am not sure the reported temp is accurate. Plus there is no reason what so ever that enabling PID on the heated bed should mess up the PID on the extruders. Especially when the heated bed isn't even in use.

I suspect the issue started when Devs started working on converting the PID values to a Matrix. But as of yet, I do not possess the skills to prove this. Working back words should reveal the truth.

If some one knows when that work started on building the PID matrix I would be happy to test various suggest commit points to confirm. Just link the points so I can download the code.

Could you maybe test 3, 5 and 15 cycles on the version that works, to see if it produces noticeably different results? But one comment that I can add (unless someone can prove it wrong) is that the starting temperature for the PID autotune might alter the results,

Versions that work produce a K value of 12-15. Typically closer to 15. I have a value generated for the Heated bed but cannot be sure it is accurate. Without realizing it. This all likely reared up the first time I tried to use PID for the heated bed and suddenly Extruders Heater wont stabilize. It has taken time to connect the dots. With this in mind, I am sorry that I cannot say what a heated bed PID should look like as I don't have that data. All of my sampling has been based on the Extruders.

Not sure if those differences are big or small. And it could be that my case is connected to #20463 (comment) , as I did my testing on an SKR Mini E3 v1.2, but there has been no problem in the past.

If you can compile an otherwise known good configuration and configuration_Adv but with PIDTEMPBED disabled. Then check to see if the heated bed still works as mine does using BangBang (if I am even saying this right). See if your Extruders also both generate results that look correct. If you have ever kept notes, please compare then from before when things works well for you.

rflulling commented 3 years ago

@qwewer0 The date modified stamped on the .gitnore, process-palette.json, and several other files that are never touched after download, is 03/01/2020 10:05 CST. I am still playing catch up after @thisiskeithb pointed out how to refer to the commit points. While the date was always added to my files after download, I understood nothing of the commit string. So I am time traveling a little. In some cases, nothing at all matches with the time stamp on the files.

This is as close as I can get. There are multiple points for the 03/01/2020 10:05, however the last commit I can download for that day has a time stamp of 7:20PM.

https://github.com/MarlinFirmware/Marlin/commits/bugfix-2.0.x?after=a26f2fb00b523ce0d91d22dc4e0b299f1fd62fc5+2070&branch=bugfix-2.0.x The closest I can get to matching the date and time is: 736521a3f1124a84a023e5ba1f030e09a95f16dc with a time stamp of 03/01/2020 7:20PM CST.

descipher commented 3 years ago

More observations:

See https://github.com/MarlinFirmware/Marlin/blob/2.0.x/Marlin/src/module/temperature.cpp#L2469

//4cycles 13:02:56.181 > #define DEFAULT_Kp 21.20 13:02:56.181 > #define DEFAULT_Ki 1.23 13:02:56.182 > #define DEFAULT_Kd 91.63

13:28:15.904 > #define DEFAULT_Kp 21.74 13:28:15.904 > #define DEFAULT_Ki 1.25 13:28:15.905 > #define DEFAULT_Kd 94.85

13:38:20.198 > #define DEFAULT_Kp 20.40 13:38:20.198 > #define DEFAULT_Ki 1.17 13:38:20.198 > #define DEFAULT_Kd 88.98

13:52:32.879 > #define DEFAULT_Kp 20.58 13:52:32.879 > #define DEFAULT_Ki 1.18 13:52:32.886 > #define DEFAULT_Kd 89.78

//8cycles 14:29:33.835 > #define DEFAULT_Kp 23.31 14:29:33.835 > #define DEFAULT_Ki 1.39 14:29:33.838 > #define DEFAULT_Kd 97.39

15:31:35.523 > #define DEFAULT_Kp 24.02 15:31:35.523 > #define DEFAULT_Ki 1.45 15:31:35.526 > #define DEFAULT_Kd 99.38

15:56:11.991 > #define DEFAULT_Kp 23.28 15:56:11.991 > #define DEFAULT_Ki 1.40 15:56:11.994 > #define DEFAULT_Kd 96.77

16:18:03.314 > #define DEFAULT_Kp 23.31 16:18:03.314 > #define DEFAULT_Ki 1.37 16:18:03.316 > #define DEFAULT_Kd 99.30

descipher commented 3 years ago

Observation: From https://github.com/MarlinFirmware/Marlin/blob/2.0.x/Marlin/src/module/temperature.cpp#L2773

PID requires a specific time base delta, averaging cannot be done in this case if the sample is on a 1000ms/16 time base and the reading is trending upward then the averaging obfuscates an accurate +- proportional correction.

The over sampling can be used to correct spike errors but should not be averaging the delta trend. While this will still work it is not optimized for accuracy. e.g. temperature trending +5 in 1000ms starts @ 100 ends at 105 the average = 102.5

descipher commented 3 years ago

Does anyone know if the intent of oversampling was to increase ADC resolution? If so we need to shift bits of the oversamples vs accumulate it as an average. The description is misleading. Scanning the code now to see what was done in the absence of an answer.

rflulling commented 3 years ago

@descipher your sample data, was that Extruder or Heated Bed?

descipher commented 3 years ago

Extruder E0

On Jan 12, 2021, at 10:00 PM, rflulling notifications@github.com wrote:

 @descipher your sample data, was that Extruder or Heated Bed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.