Fix the PID - Githubissues

Ralim commented 6 years ago

I'm submitting a ...
- [x] Feature request
Do you want to request a feature or report a bug?

We need to figure out PID constants for different tips and different voltages. This can then let us derive a relationship to automate changing these variables.

What is the current behavior? Bad performance on < 16V. Oscillations on small tips.
What is the expected behavior? Solid performance without needing to average / lie to the user.
What is the motivation / use case for changing the behavior?

To close : #274 #271 #116

What are you running:

All models, firmware 2.x +

Plan:

Implement a firmware that allows users to change the PID constants
Hope some people help with finding PID constants for different voltages and tips
Record all of this into a google spreadsheet here
Find relationships
Test
Release

Current Stage

Need to do 1

JohnEdwa commented 6 years ago

Plan:

Find relationships

Don't we all :)

I should be able to have a go hunting for the PID values with the B2, BC2, D24 and C1 (if it ever gets here) tips as well as 12/16/19V at least.

LarsSimonsen commented 6 years ago

Would it be possible to implement an autotune function of sorts? For example at first startup, heat at full blast from room temp to 150 C. Measure the time from 50C to 100C, and the time from 100C to 150C.

The time from 50C to 100C should give us an idea of the heat capacity of the tip (although not more accurately than we can guestimate the power, but hopefully better than nothing).

The difference between the 50C-100C time and the 100C-150C time can tell us the heat loss (transfer) per C above ambient.

Then at 150C abruptly cut the power, and measure the time delay before the temp stops rising.

I'm not proficient enough with PID to say exactly what to do with these measurements, but I feel they could be useful to someone who knows what they're doing. :)

Ralim commented 6 years ago

@LarsSimonsen We certainly could. But I'm not proficient enough in PID to get an autotuner working :grin: .

Honestly, it would be absolutely lovely to have and I would love to have it, but its a lot more time and effort than I have at the moment to figure it out.

If anyone can point to a good paper on implementing this, I'm happy to look into it further :) Or, if anyone really wants to see this put into the firmware, you are welcome to send a pull request :grin:

Anyway.... Here is a rough firmware which adds PID's to the advanced menu. Tuning is going to be a PITA so dont feel you have to, but it would be appreciated. TS100A_PID.zip

Note that P/I/D terms are backwards. So increasing these numbers decreases their influence. (They re used as dividers rather than multipliers to avoid floating point maths).

Let me know if you have ideas on how to improve this :)

dwillmore commented 6 years ago

New user here, sorry if I missed something important.

How about adopting the PID tuning code from one of the 3D printer firmwares like Marlin?

LarsSimonsen commented 6 years ago

Interesting proposal. Link to Marlin's autotune code.

TsvetanMarinov commented 6 years ago

Hi Ralim,

First of all thank you for the great work so far. Just a quick remark about the PID implementation. The way you have implemented the calculation for the integral part is:

Calculate the temperature difference between the set point and the actual tip temperature (error).
Integer division of the error divided by the integration coefficient.
Sum the result with all previous results (integration).
Limit the result to avoid overflow.

I would like to propose you the following way for calculation:

Calculate the temperature difference between the set point and the actual tip temperature (error).
Sum the error with all previous errors (integration).
Limit the result to avoid overflow.
Integer division of the sum divided by the integration coefficient.

The advantage is that the error caused by rounding of the result after the integer division will not be integrated. The current implementation makes the integral part of the regulator ineffective for correction of small errors (smaller than the integration coefficient). I am observing cases where the soldering iron is not able to reach its target while soldering something with high thermal mass. Probably this is effect of the rounding. A regulator with integral part should not behave like that.

Ralim commented 6 years ago

@dwillmore This has been bought up in the past, and is a solid idea which I would welcome a pull request for. Keep in mind that most 3D printer implementations do not require as "good" of a tune as the soldering iron does. (soldering tips are much smaller thermal mass). Which is why I talked about a paper that could be implemented for a different situation, rather than just mimic code.

@TsvetanMarinov This is a change I can implement for you. Note that its around 33 counts per degree C, so any errors less than that also wont show to the user. As the PID is run in the form of a raw output measurement. (so it doesnt run in C or F per say).

It was written without being overly concerned over losing a small amount of data since we are also in the same process going from a reading (~14000) down to a %age for PWM [0-100]. I can impliment the change and observe if any changes occur in the PID performance.

dwillmore commented 6 years ago

Is it possible to get an output from the iron of the sensed temperature and the drive value? I'm looking at different PID coefficient setting systems and need to know the deadtime and the lag. I would assume the lag changes with voltage--unless compensated for by the PWM value.

Ralim commented 6 years ago

Turning on the advanced screen will show you the pwm output value. But the value displayed for the temperature is low pass filtered.

I can make you a different build that doesn't filter the displayed value if you would like?

On Thu., 10 May 2018, 11:05 dwillmore, notifications@github.com wrote:

Is it possible to get an output from the iron of the sensed temperature and the drive value? I'm looking at different PID coefficient setting systems and need to know the deadtime and the lag. I would assume the lag changes with voltage--unless compensated for by the PWM value.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Ralim/ts100/issues/275#issuecomment-387919734, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLI61nvMiPVzle6oNHyY5mgV38qEI1tks5tw5J0gaJpZM4TpvfE .

--

Thanks, Ben Brown

dwillmore commented 6 years ago

Can it be sent out the USB port as serial data?

On Wed, May 9, 2018, 9:22 PM Ben V. Brown notifications@github.com wrote:

Turning on the advanced screen will show you the pwm output value. But the value displayed for the temperature is low pass filtered.

I can make you a different build that doesn't filter the displayed value if you would like?

On Thu., 10 May 2018, 11:05 dwillmore, notifications@github.com wrote:

Is it possible to get an output from the iron of the sensed temperature and the drive value? I'm looking at different PID coefficient setting systems and need to know the deadtime and the lag. I would assume the lag changes with voltage--unless compensated for by the PWM value.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Ralim/ts100/issues/275#issuecomment-387919734, or mute the thread < https://github.com/notifications/unsubscribe-auth/AFLI61nvMiPVzle6oNHyY5mgV38qEI1tks5tw5J0gaJpZM4TpvfE

.

--

Thanks, Ben Brown

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Ralim/ts100/issues/275#issuecomment-387922045, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ5L_C8VPR2xjiqY5COjxnSG-mMalbpXks5tw5ZJgaJpZM4TpvfE .

Ralim commented 6 years ago

Nope. :(

dwillmore commented 6 years ago

Crud. Okay, let me see if there are other ways to do this. The guidance I've seen so far is to get it to steady state and then put a step change on the comanded value and to watch the drive value and the measured value, then it's measuring the graph and plugging things into an equation.

I'd like to make sure the equation works for this application before commiting it to code.

JohnEdwa commented 6 years ago

@Ralim Just a note, the Github email reply doesn't work properly if you have a signature at the end of your message - it adds in the whole email thread and fails to hide it as a quote as you can see here: https://github.com/Ralim/ts100/issues/275#issuecomment-387922761

Ralim commented 6 years ago

@dwillmore There is absolutely no room left to be able to even support USB under my firmware :(

Also sorry for the email reply junk :/ Normally dont use the email reply because of this.

@JohnEdwa Yeah thanks. :/ Thought github might have handled it better by now 🤷‍♂️

JohnEdwa commented 6 years ago

I find it so funny you can add USB to a freaking ATTiny10 that has 1KB flash and 32 bytes of SRAM, but I guess no-one has really had a reason to try slimming the ST USB implementation

Ralim commented 6 years ago

Yeah well, I think it's one of those things where you can do it, but it's not worth the time and effort to implement.

I'm saying there is no room because no one has a usb core that will fit without changes, and I doubt anyone will want to invest the hours to get it working :)

I think it would be faster to change the code base to cm3 and dump the STM Hal and therefore gain some room for usb than try and slim the usb code.

If you could a manage to make a useful usb stack in 1kb (say a cdc implementation) I would be damn impressed :D

dwillmore commented 6 years ago

I understand the constraints. Without some kind of telemetry, there's no good way to get the data necessary to do any kind of analysis or tuning, so that's a dead end.

What's the resource limitation that we're running into? SRAM? What amount of FLASH does the firmware use? From the size of the .hex file, I'm guessing just 64k? Are we cheating and using the full flash of these devices? (which is rumored to be 128K even though they're supposed to be 64k parts)

pix commented 6 years ago

Would it be possible to bit bang a serial interface on the USB pins?

Ralim commented 6 years ago

@dwillmore it's possible to log to flash but a bit of a pest, and we are not using more than 64k as on two of my irons writing past the end of the 64k works intermittently, which as the chips are not tested past the 64k makes sense, and so with some out there that do not program correctly, I have not done it. Also the bootloader will not program past the end of the 64k.(not an issue for logging, but why firmware is size contained as well).

@pix it would definitely be, however I'm out of the country for a while for work commitments so I cannot Implimented this anytime soon :/

dhiltonp commented 6 years ago

Some random thoughts on the PID:

We have 2 big variables: thermal mass and available watts.

As implemented, PID tuning doesn't work well at varying voltages. This is because we calculate a PWM from the PID loop, not an amount of energy to add (which could then be converted to PWM based on input voltage).

Also, the tip temperature is measured as far from PWM as possible (I think?). This introduces an irregular delay/noise in the measurement, as we're sampling at various points along a decaying curve. This noise is killing the D's ability to respond to changes in the system.

If we measure close to the PWM, the temp reading will be higher than the tip temp, but it can also be compensated for (#303). Will it be consistent in the face of changing thermal mass? I'm not sure, but it would be nice...

On a related note, would changing the thermal mass change the oscillation period? That would kill any sort of PID tuning...

Thoughts?

Ralim commented 6 years ago

@dhiltonp

I agree, though Available watts could be just interpreted as Voltage as the tip resistance is fixed.

I agree, in the coming firmware (other branches: #303 ) a tip type selection is also implemented that means that this could be compensated by the tip thermal mass.

I would prefer to calculate energy in the PID rather than the PWm period, and would greatly appreciate a pull request if you have ideas on how to impliment this without growing code size significantly.

That is not correct, the PWM that controls Tip on/off and the ADC are coupled together in hardware, so there is a known timing event here (which is why ADC result drives the PID update).

With all modern two terminal tips you cannot measure the tip temperature while heating the tip. This is because the tip has a thermocouple built into the tip, so you have to turn off the PWM drive, wait for the output to decay and then measure the tip temperature. There is a tradeoff here that the longer you wait for the output to stabilize then the longer the tip cannot be heating for (which affects the maximum power you can effectively apply to the tip of the iron, and thus slow heating).

At the moment both ADC1 and ADC2 are setup with the injected channels being triggered from the timer that is used as the master control of whether the tip is on or off (which is what the PWM out of the PID gets written to). This timer is used to control PWM on or off, which is the 0-100 range, then it keeps counting to ~120, and its setup so that around ~110 it will cause the ADC's to start taking the measurement.

This is also made harder by there being a delay in the Op-Amp that is used to read this temperature from the thermocouple means that the delay has to be longer than ideal.

Because of the coupling from this timer to the ADC and then to the PID, the PID runs in lock step with the PWM output and the ADC's and is constantly running slightly behind.

Rough order of events for a 50% duty cycle output:

TIM2 starts counting at 0, and turns on TIM3
TIM3 is rapidly putting out PWM at around 100kHz to the tip control via the blocking capacitor
TIM2 hits 50, which causes an interrupt to turn off TIM3
TIM2 hits ~110 which causes an internally routed event to the ADC's to start the Injected sampling
ADC1 finishes injected sampling and fires an interrupt to the MCU, which gives a semaphore to unblock the PID task
TIM2 hits rollover at around 120, which causes it to load the next PID value into its registers and for this to start again.

This means that we can ensure that the PID is running at a constant offset in terms of time (as TIM2 is fixed freq).

So the D's ability to act is not being killed by the sampling point, but more that some times have a longer recovery time than others, and we are not tuning the ADC offset (in TIM2) directly at the moment, but using a value that works for the vast majority of tips, and on the few that it does not work on they will tend to over-read during heatup (as PWM is at 100%), but as the tip approaches temperature the PID backs off and the readings stabilize, so most users do not even notice this.

This is also why a low pass filter is used on the tip temperature rather than an averaging option, as the averaging operation does not deal with the huge amount of noise coming into this very well. As the tip has a large amount of noise in some cases from the capacitance of the tip (and how good contact it is making in the handle), as well as the op-amp is noisy, and the ADC's in the STM32 are also not overly amazing in terms of noise. So for each sample point (TIM2 PWM period) there are 8 samples taken with the dual ADC's, these are then summed and run through the low pass filter to produce a reading that his a relatively low rolloff point, but also removes almost all of the ADC noise from sample to sample readings.

This is also why the filter is only 'clocked' through on new samples arriving in the PID loop, as there is nothing to gain from feeding it with duplicate old samples from the ADC's as they wont have updated their values.

The whole point of the above mess around is to make the ADC sampling time as short as possible to increase the available power to the tip to help recover and heat up faster.

For more information on how this is co-ordinated I did write this up somewhat on my blog : Over Here

Thermal mass completely changes the PID response, on my B2 tips its quite stable, on the C2 tips (tiny) it oscillates with around a 2 second period.

EDIT: Also, TIM2 is a fairly slow timer (around 10Hz off memory). This is because of the recovery period of the tip after heating being stopped is around 3-5 milliseconds off memory. I can get better values if needed, but i did spend around a month tuning these values for a reason :(

dhiltonp commented 6 years ago

Shoot.

(thanks for the feedback, btw)

I'm still chewing through your posts, but given your reaction to #354, another idea I had - that of taking back-to-back temp samples to get a heat decay curve - wouldn't work?

Ralim commented 6 years ago

Thank you though, it's nice to get fresh input :) It's a hard problem to solve :/

Yeah, let me know if it doesn't make sense though,happy to explain further :)

It could work, but the samples are taken really fast after each other, so not sure if it will work.

dhiltonp commented 6 years ago

What voltage were you running at when you did your PID tuning?

dhiltonp commented 6 years ago

PR #358 isn't complete, but should give us watts once we get some constants in.

Also, how frequently does the PID update run? If we track duration of each PID cycle and accumulate over some window, we could show output in watts instead of PWM %.

Knowing watts needed to change temperature should also tell us thermal mass!

dhiltonp commented 6 years ago

Nevermind re PID update frequency - I just reread your post and saw it's at 10hz.

Looks like it's at 32hz (just for future reference)

Ralim commented 6 years ago

Let me know when you think you are "good" with the PR, I'll double check the 10Hz and get back to you, but fairly sure its around that.

Only issue with "Watts" is that the tip resistance isnt known, though they are generally within around 1 ohm of each other. Would be more intuitive either way.

dhiltonp commented 6 years ago

I've done some limited PID tuning, but without a thermocouple it's hard to know what's going on at the tip, not just by the element.

I've left the debug prints in place for additional tuning. I can strip them out and reset the PID_* variables if you'd rather take the code as is.

TS100_WATTS.zip (out of date, see below).

Code is in #358.

Ralim commented 6 years ago

In my limited testing so far, this looks to be a good improvement on my units here ! Going to give it a bit more of a test and then merge this in. Will probably need some tweaks, but its working well here on 5/6S cells. Will test others soon.

dhiltonp commented 6 years ago

I spent a fair amount of time doing further tuning, it's pretty good on my unit (BC2 tip).

P, I and D are defined based on physics models for heat so they're easier to reason about :)

I also added a parameter for thermal mass if we decide to add calibration for that.

The formula is pretty simple - see how long it takes to ramp up the temp 100*C at max watts. Thermal mass is in Joules/*C, so the conversion is Watts*time/100.

dhiltonp commented 6 years ago

Current version: TS100_WATTS.zip

Ralim commented 6 years ago

The BC2 tip tends to be fairly close to the majority of the tips in terms of thermal response.

I'm thinking about merging the newer temp calibration and gain support in from the branch, even though it needs some tuning to the cal methods. Could add to its calibration routines the thermal capacity calculations if you are up to it?

dhiltonp commented 6 years ago

I could, but before doing that I'd like some indication it'll be beneficial. In my testing, the thermal mass parameter was pretty forgiving.

If you have a head that's 4 times as massive, it'll overshoot by a few degrees then the power will stabilize fast.

If you have a head that's <70% as massive, the temp will stay within 1*C, but the power output won't stabilize.

I'd rather tune for the smaller head and let the large one overshoot than add calibration.

Ralim commented 6 years ago

I'm going to close this for now, pending on #358 as from current looks, that work by @dhiltonp is going to make this all work really well :)

Ralim / IronOS

Fix the PID #275

Plan:

Current Stage