ScratMan / HASmartThermostat

Smart Thermostat with PID controller for HomeAssistant
336 stars 48 forks source link

pid_i stuck at 100 after a blackout #146

Closed Chupaka closed 1 week ago

Chupaka commented 1 year ago

Describe the bug

Yesterday there was a blackout there, so air temperature in my house quickly decreased. Looks like when electricity fas fixed some time later, my thermometer updated a thermostat with big error and big dt values, so pid_i became very high and was rounded town to 100. After that it was never lowered - I suspect, this "protection" https://github.com/ScratMan/HASmartThermostat/blob/89b6800139894c5854f2c261698920fe0daed375/custom_components/smart_thermostat/pid_controller/__init__.py#L196-L199 stopped PID controller from updating pid_i. At the same time, a value of pid_p was not enough to compensate pid_e, so control_output was always 100 and my boiler was working at maximum power until I noticed that and cleared integral part of PID controller.

So the question is, do we really need that condition self._out_min < self._last_output < self._out_max to integrate? Especially when both pid_i and control_output are limited to [_out_min; _out_max] range.

Chupaka commented 1 year ago

Okay, we do need a limiting condition, and I'm preparing a PR to do this better :)

ScratMan commented 1 year ago

The limitation is there to avoid wind-up of the integral, but you faced a condition that could bypass the protection and create a wind-up. I need to improve the anti-wind-up to detect if the last datapoint is valid or not. Did your HA instance stopped during the blackout and restarted after ? Or is it on an uninterruptible power supply ?

Chupaka commented 1 year ago

Nope, my HA is in cloud, it was online during that blackout.

Did you check my PR? #151 - that allows my modes of operation while still protects from wind-ups (I believe so). In that particular case, this would allow pid_i to decrease and un-saturate (does this word exist?..) the output.

ScratMan commented 1 year ago

OK. Your issue is not due to the anti-wind-up, but to the fact that your thermostat was still trying to maintain the temperature while the heating system was down and HA didn't know about it. Changing the anti wind-up mechanism won't help in that case, we need to implement some checks that the hardware is still alive and responsive.

Chupaka commented 1 year ago

your thermostat was still trying to maintain the temperature

keep_alive is set to zero, all sensors were unavailable, so there should not be any reasons to calc PID. Was it trying? Or it was just waiting?

E.g. we have stable situation with target temp = 20, current temp = 20, pid_i = 20, Ki = 0.02 and electricity is gone for 2 hours. When electricity is back, we have something like current temp = 19, dt = 7200, and this line https://github.com/ScratMan/HASmartThermostat/blob/1c1b124d57a77bdd043b3d5d85dd4134b5ea6265/custom_components/smart_thermostat/pid_controller/__init__.py#L200 makes pid_i to become... 20 + 0.02 1 7200 = 164? And we're stuck forever? Stuck due to anti-wind-up, because now pid_i can't decrease at all. Did I miss something?..

ScratMan commented 1 year ago

It's not stuck. The line after, pid_i is clamped to the 0/100% range. So pid_i = 100% and output becomes 100%, heater is forced ON and temperature will start to increase (normal behaviour, as current temp is below set point).

Then, on the next computation, last output is 100% so there will be no update in pid_i, but p and d are evaluated. As temperature is increasing, pid_d will become negative, and as soon as the temperature reaches the set point the pid_p will become <= 0. The PID output will then decrease progressively, enabling the integral again and error being negative the pid_i will decrease. So the system is not stuck, but for sure it may overshoot a lot. But in the end, it just needs some time to settle back to normal after an unexpected input has generated a glitch.

The problem there is that the pid_i is summed up while the system is recovering from a fault. So I need to add a way to detect the faulty condition of the hardware to make the recovery smoother by limiting the overshoot.

Chupaka commented 1 year ago

Some time later I enabled sensors for tracking pid_{p,i,d,e}, so I have exact numbers right before I reset the integral part.

pid_p = -8.9 pid_i = 100 pid_d = -0.6 (it's underfloor heating, very inert) pid_e = 17.9

So, pid_p and pid_d was not enough to compensate pid_e.

ScratMan commented 1 year ago

On an underfloor heating, you shouldn't use ke at all, but use an outdoor sensor on the boiler or heat pump and use it's own power regulation system based on outdoor temperature instead.

ScratMan commented 1 year ago

Some time later I enabled sensors for tracking pid_{p,i,d,e}, so I have exact numbers right before I reset the integral part.

pid_p = -8.9 pid_i = 100 pid_d = -0.6 (it's underfloor heating, very inert) pid_e = 17.9

So, pid_p and pid_d was not enough to compensate pid_e.

Would you have the full data set with gains and temperatures?