Open rkonklex opened 8 years ago
That was definitely a bug. I commited a quick fix to my branch. Now temperature control is perfect!
Nice! Where is the carefully crafted fix for everybody?
That depends on whether my solution is good enough, and what more needs to be done. You're the best person do decide that, since I'm only beginning to know the code and how it's supposed to work :)
Have a look at the commit. Basically, I've added another counter (next_heat_time
) that clocks intervals of 250ms (i.e. every 25 calls of temp_sensor_tick()
), and then moved the heater_tick()
call into this clock.
It works, but there may be room for improvement. Right now, each sensor had a separate 'read temperature' clock, and my patch adds another one for its heater. I'm wondering if having just a single clock for all heater updates might be a better idea. The clock interval is fixed, so that would make sense.
On the other hand, having separate heater clocks for each sensor would allow us to vary the phase of the heater clock per sensor. Sensors have varying read intervals, and the heater interval is fixed. We could delay or hasten a heater clock tick a bit to make heater_tick()
operate on the very latest reading, while maintaining 250ms intervals within some tolerance (10%?).
You're the best person do decide that
Best decisions are based on evidence, performance measurements, code size, such stuff. And by no means I'm an expert in every detail of this firmware.
I see your code, still it "smells" like duplicate code. Doesn't it work to simply reduce PID parameters? Or to slow down temp_tick()? Also, there are these issue74 branches which try to improve the situation in this area, maybe you find a better answer there.
Best decisions are based on evidence, performance measurements, code size, such stuff. And by no means I'm an expert in every detail of this firmware.
In general, I agree, but in this case we'd have to first define the goal and figure out how to assess solutions against that goal. I want some discussion and a plan before I go changing the code. Maybe @Wurstnase will also have some input, since he's working on PID tuning.
Obviously, we want temperature control to work. And with my patch it does. I'd also like it to use as little resources as possible to do the job. And I'd like to do it in the least intrusive way possible, i.e. definitely not do a complete rewrite of the code.
First off, heater_tick()
expects to be called regularly in 250ms intervals. We could change that, (1A) factor in the time that has actually passed since the last call. The easy way would be to add an additional parameter to heater_tick()
- it would add two multiplications and a division per call. I could live with that. That would be just a different type of a quick fix - doesn't really change anything, just makes thermal management work. Also, updating thermal manager every 10ms would be murderous for relays, if anyone uses them for controlling heaters. And it annoys me, since I have a LED indicator on the hotbed, and it goes crazy :)
Or (1B) keep the 250ms intervals, but make sure heater_tick()
is being called accordingly. That's what my patch does. It's ok from my perspective - I can live with both hotend and hotbed thermal managers being updated 4 times a second.
Or (1C) have the intervals constant, but for each heater individually. I really like this idea, since this would allow us to update thermal manager more frequently for the hotend, and less frequently for the hotbed. The first has small thermal mass and is additionally cooled by the fan, so it reacts very quickly to changes in heater power. The second has large thermal mass and reacts slowly. Having the intervals constant, we could precalculate PID constants for each heater/interval during initialization.
heater_tick()
call is dependant on temp.c, since it needs a couple of parameters that are private to temp.c. More than that, heaters are actually a property of sensors. Now, at the moment we have a couple of possible temperature sensors, each with its own probing interval:
I'd say 10ms intervals for temp_sensor_tick()
are a must.
Because sensor and heater update intervals may differ, we need to decouple their clocks. Additionally, there's the EWMA and temp_residency checking, and reporting temperatures every 10ms by serial port is a killer for bandwidth (and Pronterface) and unnecessary noise. I don't really understand the theory of EWMA, i.e. does it also need to be called in fixed intervals? Should it be coupled to sensor reading, to heater update, or independent? Then, temp_residency is in the order of seconds, so checking it every 10ms is overkill. Once or twice a second would suffice. The same goes for reporting temperatures by serial port.
Having said all that, I propose the following:
temp_sensor_tick()
.temp_sensor_tick()
calls) for temp_residency checking and serial port reporting.next_read_time
).heater_tick()
on a 10ms clock, but...Interesting thing with the quarters.
EWMA is a moving average. Looks not that nice and could be replaced by shifting.
Clock-thing can be moved into clock. Currently we have:
I would move:
I wouldn't move heatbed and hotend in different places. My heatbed has 350W and with that power you need more update. Maybe others have a very small bed or whatelse. You can't make a prediction for that.
Good idea moving heater and temp. residency ticks to clocks! No need for additional counters that way :) I just created another branch. Already pushed the changes you proposed.
Next, I'll try and experiment with varying heater intervals. I'm just interrested to see what happens to temperature control quality when I decrease/increase the intervals. I'll make some charts :) Besides, I mean to make intervals configurable in the configtool for each heater. So you could enter a shorter interval for your 350W heatbed, while I'll keep a longer one for mine, which is probably ~120W :)
I just created another branch.
Thanks. I see so much additional code. Code is expensive. Less code to achieve the same goal is always better and so far I see no reason why temp_sensor_tick() needs to be called more often than heater_tick(). I also see no reason why changing an interval requires additional code.
Not that much new code - just a few lines.
As to why temp_sensor_tick() should be called on a different interval than heater_tick() I see a couple of reasons:
When you compile this with make -f Makefile-AVR all
you can see the difference of this code. This would make the decision easier, how good/bad this idea is. I don't see any big benefits to have different loops for thermal control. The performance winning should be minimal. So one config for all should be sufficient.
One thing a bit unpleasant about temperature readings is that it causes a whole lot of interrupts on AVRs. For every reading recognized by temp_sensor_tick(), about 25 interrupts happen. That's only 4% efficiency. Ideally, a reading with interrupt would happen only if actually needed.
Well, if EWMA was disabled (TEMP_EWMA = 1), then the code would simplify to
temp_sensors_runtime[i].last_read_temp = (uint16_t) temp;
Right now, EWMA is the only use at all for readings other than the one directly before heater clock tick. I suppose if you make 25 readings and smooth them out with EWMA between heater clock ticks, then the heater manager gets a better 'view' of temperature in the interval.
However, what if EWMA was updated only once per heater clock tick? Immediately before. Then we'd have no need for the other 24 readings in the 250ms interval, and we could optimize them away. That would be 100% efficiency. As a bonus, we'd have EWMA updating on a constant interval, which would make it more predictable and understandable in behavior (as opposed to a different interval for each sensor type, like it is right now).
Ewma could benefit from more readings, because it is a slow moving average. Make a simple excel table. Each cell has the same value. Only one is 10 or 100 away. Then you'll see what I've mean.
Yes, a moving average is a good thing. To my experience it doesn't matter at which frequency it happens, because noise is random. But it has to happen often enough to follow the actual signal.
Accordingly it could happen at the 2 ms second interval, at the 10 ms interval or even right within the interrupt with equivalent results. Doing only one averaging step each 250 ms likely follows the actual temperature too slowly. With 16 sample averaging, a change in temperature would take about 8 seconds ( ~ 30 samples) to fully reach the PID algorithm.
So 16 sample average with 10ms and controlling with 250ms could be ok for normal ADCs.
I've been out of things a while, but the EWMA and the PID units comments were my additions. One thing that is important in PID is getting the control loop time to be compatible with the physical process time constant. If it takes 30 seconds for the system to absorb a step change, then the system time constant, Tc, is about 10s, and the control loop should have a period of about 1/5th to 1/10th of that, or 1-2s. If you control much faster than 10x the system time constant, then autotune-estimated Ki and Kd terms should be scaled down and up proportionally. To see this, consider doubling the PID frequency to 20x Tc, and note that the integral accumulator will accumulate twice as fast, and the temperature change per cycle will halve. At a PID update of 100x the system time constant, the integral term will guarantee overshoot, and the derivative term will contribute mostly be zero (or pure noise).
I like the 250ms interval much more than the 10ms interval, and thought the PID system was using 250ms. 1s might be even better.
As for the EWMA, I wrote the EWMA filter as a workaround for very noisy sensor data, which I think I eventually solved with better filtering on the power supply--I think AREF moved much faster than my sensor. There's no reason for the hardware to give you 5+ bits of noise. I also think I tried the EWMA for giving a longer time-average than what would fit in the fast-sampling temperature history window for the straight moving average. With EWMA, you need only one memory value to keep a history/temperature estimate that could adjustably span from 10ms to multiple seconds, weighted more towards the present values.
Ok, to sum up...
next_read_time
in temp.c. It would allow us to effectively lower the PID frequency per-heater.Agreed?
We are sampling with 125kHz on AVRs. A good solution could be a EWMA there if needed. Also we could skip 9 of 10 samples to improve the speed. 12,5 kHz are also enough.
My idea on the STM is to start a new conversion just 10ms or whatelse before the heater/pid/temp-tick. So the values are not very old. Everything is just a small move. Maybe I'm lucky to find some time this week.
My idea on the STM is to start a new conversion just 10ms or whatelse before the heater/pid/temp-tick.
This would be awesome and saving a lot of interrupts on AVR, even when picking 25 samples for EWMA. :-) With two heaters we need 50 readings per 250 ms, but actually read 2560. That's 98% waste.
next_read_time
increments, you could calculate them during the autotune.You do need some deltaTemperature/time data somehow to apply a D term, so you probably need more than 10ms*TH_COUNT time difference to get a non-zero rate-of-change for the D part of the PID. If the circuitry filtering and the numerical filtering give you a perfectly smooth, noise-free and and accurate temperature reading, the deltaT in 10ms or 250ms is very likely to be below the resolution of the temperature data, making the D contribution mostly zero, with occasional impulses of size d_factor.
analog_read()
to async mode. Could change it so that analog_read_begin()
starts an ADC read, and the ADC ISR stops after the read, i.e. doesn't schedule the next read. That would allow us to schedule a read only when necessary, i.e. every 10ms for EWMA, and much less with EWMA disabled.pid_set_p()
/ pid_set_i()
/ pid_set_d()
should have well-known, fixed units. Currently, its qC/qs. We could change it to qC/s, but not necessarily. The values should then be adjusted for heater clock frequency, be it 1/1s, 1/250ms, or whatever we decide now or in the future.Regarding D term and TH_COUNT - first off, the history ring buffer is updated only each heater clock tick. So with a 1s clock, D term spans TH_COUNT*1s into the past. Is that good enough?
Or, we could use a very slowly moving EWMA for collecting the D term. It could replace the ring buffer and TH_COUNT.
On a frequency of 250ms the last temp as D-term should also be enough. Marlin uses a moving average for the D-term only. This could also be used for smoothing it.
Guys, I messed up. I accidentally force-pushed not only my heaters_intervals branch, but also master and experimental, which were out of date... @Traumflug, I think if you force-push them from your copy of the repo, things will go back to the way they were. I'm so sorry... won't happen again!
No problem, already restored.
Thanks! And sorry for the trouble.
Just pushed a bunch of commits to my branch. Heater control runs on the 250ms clock. Sensor update frequency depends on EWMA being enabled. If enabled, then sensors are read every 10ms; if not, then only just before the heater control procedure. This way, no sensor reads are being wasted.
Also, ADC conversions are started only one 10ms clock cycle before the results will be needed. ADC then reads a single value for each registered sensor, and then stops. No wasted interrupts :)
Thanks @phord for your observations. Made a few adjustments.
@nythil Thanks for being receptive to my comments.
I'm still a bit unsettled, though. I think this patch (fd84e04ece95250181f10d71722a0e211e279b9f) makes the change cleaner, but I have not tested it. Does it look ok to you?
Looks a bit overcomplicated. I took a look into other firmwares. For example, Marlin sets it's temperature only 5 times per second. So 4 times could be sufficient enough.
Should be much easier to start the read any 250ms. Take the old next_read_time and use it for a read_counter. Each counter (10ms) you read the temp. When read_counter == 1 you read the last temp and afterwards you set the heater. That's it. Now you could add later a oversample. You have time enough to have a oversample of 16. Or with normal division up to 25.
In the clock_250ms() you can reset the counter. And the next read/heater_set starts.
Wow! What a lively discussion! :)
@phord this patch is good stuff - I didn't notice that the temp_type_t
enum has all values defined, no matter if the code for the sensor type is actually included (#ifdef TEMP_THERMISTOR
etc.). This simplifies things.
As for timing analog_clock()
- it's tricky. I use a logic analyzer to trace it live, running on an actual ATmega. I will upload some traces/timings later today. Now, the problem is that analog_clock()
only initiates an ADC read - it doesn't wait for it. You need to read each channel (sensor) separately - that's why we do a couple of reads in a series. A single channel read takes 13.5 ADC clock cycles, and the ADC clock is set to 1/128 of the CPU clock (it's as slow as it can go). This sums up to 1728 CPU clock cycles to read a single sensor, worst-case. On a 16 MHz AVR that's 108 microseconds. That's upto 92 ADC reads within the 10ms interval.
This means two things. First, you need to initiate an ADC read at least one next_read_time
count before you want the result. Second, a single 10ms clock cycle is plenty of time to read all the channels, 16x oversampled even. If you're feeling paranoid, we could add a function to analog module to make temp_sensor_tick()
wait for the read to finish.
Also, note that for sensors other than TT_THERMISTOR and TT_AD595, next_read_time starts at 0, which means that your patch overflows their counter at the beginning.
The slow speed of the analog read is only to reduce the ISR. Now we want to start the ADC any 10ms before read. So this could be now much faster.
@Wurstnase, my comment above was directed at a discussion that's going in a couple of commit comments.
Ideally, I'd like the timeline to look as follows: 0ms: reset
...
240ms: analog_tick() - start ADC read 240ms + 108usec: ADC ISR - first channel reading ready 240ms + 216usec: ADC ISR - second channel reading ready ... 250ms: temp_sensor_tick() - process the values, convert ADC steps to temperatures 250ms: temp_heater_tick() - run PID algorithm
...
490ms: analog_tick() - start ADC read 490ms + 108usec: ADC ISR - first channel reading ready 490ms + 216usec: ADC ISR - second channel reading ready ... 500ms: temp_sensor_tick() - process the values, convert ADC steps to temperatures 500ms: temp_heater_tick() - run PID algorithm
...
740ms: analog_tick() - start ADC read 740ms + 108usec: ADC ISR - first channel reading ready 740ms + 216usec: ADC ISR - second channel reading ready ... 750ms: temp_sensor_tick() - process the values, convert ADC steps to temperatures 750ms: temp_heater_tick() - run PID algorithm
...
990ms: analog_tick() - start ADC read 990ms + 108usec: ADC ISR - first channel reading ready 990ms + 216usec: ADC ISR - second channel reading ready ... 1000ms: temp_sensor_tick() - process the values, convert ADC steps to temperatures 1000ms: temp_heater_tick() - run PID algorithm 1000ms: temp_residency_tick() - update temp. residency info
...
Should the ADC start at 240ms? Why not at 250ms and the heater tick 10ms later?
0ms reset counter 0ms ADC start for channel, like freerun 10ms any channel conversion should be ready 10ms sensor->convert 10ms heater->set ... 250ms reset counter 250ms ADC start 260ms ... etc. ... 1000ms temp_residency_tick(), we don't need that accurate values, so they can be 240ms old.
1000ms... ...
And if you want an oversampling
0ms reset counter 0ms start ADC 10ms ADC ready 10ms start ADC 20ms ADC ready 20ms start ADC ... 160ms ADC ready 160ms ADC/div whatelse 170ms heater->set ...
Now we're getting somewhere :)
Please note that we have other sensor types (TT_MAX6675, TT_MCP3008, TT_INTERCOM, etc.) that have different timing needs (250ms, 100ms, 250ms, respectively) than TT_THERMISTOR and TT_AD595.
We could however initiate sensor reads at the existing 250ms clock ticks, and then run heater->set when the reading is finished. You'd need no additional timers/counters for that, just re-purpose next_read_time
to something like read_finished_time
. And heater->set would still be called in 250ms intervals. Is that what you mean?
0ms: start ADC for TT_THERMISTOR 0ms: start reading of TT_MAX6675 0ms: start reading of TT_MCP3008
10ms: ADC ready for TT_THERMISTOR 10ms: sensor->convert for TT_THERMISTOR 10ms: heater->set for TT_THERMISTOR
100ms: reading ready for TT_MCP3008 100ms: sensor->convert for TT_MCP3008 100ms: heater->set for TT_MCP3008
250ms: reading ready for TT_MAX6675 250ms: sensor->convert for TT_MAX6675 250ms: heater->set for TT_MAX6675 250ms: start ADC for TT_THERMISTOR 250ms: start reading of TT_MCP3008 250ms: start reading of TT_MAX6675
260ms: ADC ready for TT_THERMISTOR 260ms: sensor->convert for TT_THERMISTOR 260ms: heater->set for TT_THERMISTOR
350ms: reading ready for TT_MCP3008 350ms: sensor->convert for TT_MCP3008 350ms: heater->set for TT_MCP3008
500ms: reading ready for TT_MAX6675 500ms: sensor->convert for TT_MAX6675 500ms: heater->set for TT_MAX6675 500ms: start ADC for TT_THERMISTOR 500ms: start reading of TT_MCP3008 500ms: start reading of TT_MAX6675
...
Like this, right.
But you don't need to wait for MCP, MAX and co. You only get 'new' values any 100/250ms.
So for example MAX: 0ms reset counter 10ms read->max 10ms heater->set
250ms reset counter 260ms read->max
You could read and set MAX also at 0ms/250ms etc. But I think it would make it more simple to have similar construction for them.
You'd need no additional timers/counters for that
And this is the point, I'd tried to tell you :+1:
And heater->set would still be called in 250ms intervals.
No, heater is called, when the temp is ready. Just behind the last read of a 250ms cycle.
0ms ADC start for channel, like freerun
On AVRs, there is no freerun for multiple channels. Mimicking one, like it currently happens, causes a lot of pointless interrupts.
One can write the ADC interrupt to go through all sensors once, stopping after that.
You're right. That's why I've written 'like'.
0ms reset counter 0ms start ADC ISR xxus all channels read -> stop ISR 10ms analog_read() 10ms convert 10ms heater->set
And heater->set would still be called in 250ms intervals.
No, heater is called, when the temp is ready. Just behind the last read of a 250ms cycle.
@Wurstnase, that's what I had in mind, I think. Heater->set is called only when the temp is ready, but the timings work out so that there's always 250ms between Heater->set calls for each sensor. Like in the comment above. With the exception that reading TT_MAX6675, TT_MCP3008, TT_INTERCOM are blocking, synchronous calls, so no need to wait for them like with AVR's ADC.
One can write the ADC interrupt to go through all sensors once, stopping after that.
@Traumflug, that's what my patch does :)
Currently untested, but this is how I think: https://github.com/Traumflug/Teacup_Firmware/tree/heater_intervals-2
Is there any reason we can't just standardize on 250ms temp conversion intervals so we can avoid all the individual timers?
000 ms init 250 ms ADC kick if needed 260 ms Temperature conversion 260 ms heater tick
We might need some special handler for EMWA smoothing, but that shouldn't be too hard. But now it's starting to sound like what we have already. 😀
The EWMA is just a small step more. Set the counter to 16, start the next conversion just before the heater.
The first step is, to get rid of the endless ADC-ISR.
And heater->set would still be called in 250ms intervals. No, heater is called, when the temp is ready. Just behind the last read of a 250ms cycle.
It's important that heater_tick is called at precise, fixed intervals because it runs the pid loop. If someone has their PID values well tuned and then some change comes along which causes the 250 to become 300, for example, their PID will need retuning but it won't be obvious to the user. It will only be suddenly "wrong".
Maybe it's ok if we always call it after the conversion so long as the conversion is occurring regularly at 250ms intervals (20, 270, 520, 770, 1020, etc.). But in that case it should probably be stated officially in temp.c that the 250ms interval is necessary and integral to the PID loop so someone doesn't change it inadvertently in the future.
Your points are 100% correct :+1: 250ms intervals should be enough for any heater. I'm just starting to solder/layout some mosfets for my prototype to test the heater, PID and anything. Hopefully it is ready this week.
And heater->set would still be called in 250ms intervals. No, heater is called, when the temp is ready. Just behind the last read of a 250ms cycle.
Just for the record. I mean, that the interval is 250ms, but just behind the last reading and not on an extra timer.
TT_THERMISTOR for example 0ms reset counter 0ms start ADC 10ms convert/read etc 10ms set heater 250ms reset counter/start adc 260ms convert/read 260ms set heater
100% the same for MAX, MCP etc. just without start ADC, because they don't have it.
How about this: 68881b030b601b28476e075e1a426d98e6256710
Sorry about all the refactoring noise. I intended to save that for later, but it got mixed in.
@phord Looks like pretty clean! :+1: It could be also extended for EWMA in an active state for thermistors and other faster analog values.
@phord this is great! Solves all of our problems in this lengthy discussion :) And @Wurstnase is right - could be easily extended for EWMA!
I left EWMA in place, but it runs only in 250ms intervals instead of fast-as-possible. But it should be easy to put it back into "free-running" EWMA just by always setting to 'active=1' when active=0, and then using 'active' as our interval timer for those probes which need to rest between probes.
Like this: 0d45312c1f78cb5c4f79c8174f915b1982571497
Hi! I see you've all been very active :) How are things? I see we have handled both the EWMA and non-EWMA case. Readings are being collected only when necessary. ADC interrupts are down to a minimum. PID control is called regularly every 250ms. I say we have achieved our initial goals.
We could tweak it a little bit more. On the continuous mode the thermistors are only read any second cycle, 20ms. We could read the ADC and restart it just behind to have a 10ms cycle again.
Anyhow, I rebased the commits on a local branch for my arm-port. I'm really close to finish my protoboard to test it on a real printer.
I say we have achieved our initial goals.
Looks like you're looking forward to move this bunch of commits to experimental. Excellent!
A few things which help to have a useful history later:
git rebase -i master
is very helpful. If you fear to mess up during squashing, make a backup branch (git branch heaters_intervals_backup
).git checkout heaters_intervals
tools/git-step-rebase experimental
git rebase --continue
, then step-rebase again.git checkout heaters_intervals
tools/git-regtest experimental
git checkout heaters_intervals
) and use git rebase -i experimental
to fix the commit in question.So far I looked at these commits only briefly; what I saw looks pretty good :-)
Wanting to tune temperature control in my printer, I looked through PID code in search of units and magnitudes of the parameters. I found something misleading. Perhaps a bug. In code you can see:
So the basic units are qC, which I'm guessing is a quarter-Celsius (because temperatures are in 14.2 fixed point) and qs, which is probably supposed to mean a quarter-second.
The thing is, for thermistors the temperature is read in every temp_sensor_tick() call, i.e. every 10ms (because the
next_read_time
counter is set to zero for thermistors). Then, after every temperature reading, heater_tick() is called, which performs the PID algorithm. So PID is called 100 times a second, not 4 times. This would make the basic unit of time for PID a one-hundreth of a second, not a quarter of a second.Is that a bug, or outdated comments in code? I'm leaning towards a bug, because executing the PID step 100 times a second is a major overkill for a 3D printer :)