Blackymas / NSPanel_HA_Blueprint

This allows you to configure your complete NSPanel via Blueprint with UI and without changing anything in the code
1.4k stars 253 forks source link

`Enhancement` Smoothen the temperature #1918

Closed grigi closed 6 months ago

grigi commented 6 months ago

Enhancement Summary

Filter the temperature reading to make it smoother to make it useful for automations

Detailed Description

The internal temp sensor is noisy, consider smoothening it by adding a sliding_window_moving_average filter. The current reading looks like this: image

If you could e.g. add a simple filter to id: temp_nspanel it could help make that smoother. e.g.:

    filters:
     - sliding_window_moving_average:
       window_size: 5
       send_every: 1
      - lambda: |-
          return x + temperature_correction->state;

Additional Context

Mostly the variation is minute to minute, so I don't think a large window size would be needed. If you really want faster response times you could set window_size to 2 and half the read interval to 30s, but I don't need the temp meter to be that responsive, just a little smoother.

edwardtfn commented 6 months ago

I have no problem about implementing this, however we probably should do it configurable by users, as in some cases this might not be desired. Please take a look in the comment from @andythomas on https://github.com/Blackymas/NSPanel_HA_Blueprint/discussions/1857#discussioncomment-872433. Perhaps we can have here some discussion around averaging.

edwardtfn commented 6 months ago

Reducing the interval between ADC measurements may be an option to compensate averaging responsiveness.

grigi commented 6 months ago

I read that entire issue now, Highly noisy reading like I have there is one of the things that make a simple bang-bang algorithm for floor heating unreliable. But I see wanting to reduce latency.

So multiple samples a minute could give us the best of both worlds?

It looks like the thermistor is always powered, so taking more readings won't result in any extra self-heating above what's already happening.

edwardtfn commented 6 months ago

This can be relevant also: https://github.com/esphome/esphome/pull/6330

edwardtfn commented 6 months ago

So multiple samples a minute could give us the best of both worlds?

I believe so.

It looks like the thermistor is always powered, so taking more readings won't result in any extra self-heating above what's already happening.

That's right. Disabling sleep may have a much bigger impact, as the CPU and display will increase temp.

grigi commented 6 months ago

Yes, Marlin (the 3D printing firmware) takes 16 samples in a row, every second. And this gives it reliable readings to about 0.05°C on most thermistors.

Until that lands, doing a reading every 20 seconds and emitting them once a minute will probably help significantly.

edwardtfn commented 6 months ago

My concern is then on the CPU needed for these additional readings and averaging. I've seen messages related to ADC taking longer than expected, so this will probably happen more often.

grigi commented 6 months ago

That's right. Disabling sleep may have a much bigger impact, as the CPU and display will increase temp.

I, um tried to measure it, and temp readings get raised by about 0.4°C when doing the screen firmware update, but then drops to ~ 0.1°C elevated with screensaver for me only. Much less than I expected.

grigi commented 6 months ago

My concern is then on the CPU needed for these additional readings and averaging. I've seen messages related to ADC taking longer than expected, so this will probably happen more often.

You could minimise CPU load increase by doing the rolling window over the ADC reading only, and not the post-processing steps?

Surprising that reading an ADC value is "slow" as it shouldn't block, it's not a bit-banging interface where you might have to get timing right with something external like the DHT sensors.

edwardtfn commented 6 months ago

Until that lands, doing a reading every 20 seconds and emitting them once a minute will probably help significantly.

We don't have to wait. We can try it now... just add this customization to your panel's yaml and please let us know the results:

sensor:
  - id: !extend ntc_source
    samples: 60

external_components:
  - source: github://pr#6330
    components:
      - adc
edwardtfn commented 6 months ago

I've tried that... still have to wait more to say anything about the improvements, but I can say the ADC related warnings aren't there.

grigi commented 6 months ago

Thanks, I tried it as well, I can say that the temperature result is definitely less scattered. Deviation is about ⅓ of before. Though it still isn't "smooth". Deviation is spread across ~0.2C now instead of the prior ~0.7C

Still not as clean as the Sonoff sensors, but those may have aggressive smoothing applied. (but not as aggressive as the Aqara devices which seem to lag by over 10 minutes compared to the Sonoff ones)

A very definite improvement :+1:

grigi commented 6 months ago

FYI: I reduced the samples from 60 to 8 to see what it does, and there isn't any noticeable difference that I can see.

edwardtfn commented 6 months ago

What if you try a much bigger number? Let's say 10000 samples?

grigi commented 6 months ago

value must be at most 255. So 255 it is

grigi commented 6 months ago

I honestly can't differentiate between 8 and 255 samples. But either looks notably better than 1.

Maybe the ±0.2°C deviation is "accurate" :man_shrugging:

edwardtfn commented 6 months ago

And what about adding the filter you suggested?

Your customization should be like this:

sensor:
  - id: !extend ntc_source
    samples: 255

  - id: !extend temp_nspanel
    filters:
      - sliding_window_moving_average:
          window_size: 5
          send_every: 1
      - lambda: |-
          return x + temperature_correction->state;

external_components:
  - source: github://pr#6330
    components:
      - adc
grigi commented 6 months ago

I tried this (on purpose to keep the lag low):

sensor:
  - id: !extend ntc_source
    samples: 8
    update_interval: 30s
    filters:
      - sliding_window_moving_average:
          window_size: 4
          send_every: 2

And it reduced the spread to about 0.1°C.

I'm trying a more agressive config now.

edwardtfn commented 6 months ago

I've tried samples: 255 but then I get the Component adc took a long time for an operation quite often.

grigi commented 6 months ago

It's completely smooth with this: (255,10/2,30s)

sensor:
  - id: !extend ntc_source
    samples: 255
    update_interval: 30s
    filters:
      - sliding_window_moving_average:
          window_size: 10
          send_every: 2

But, yeah, that delays response times by ~2 minutes and is a bit overkill ito resource usage.

I also tried a few other configurations to reduce the delay, but I basically settled again on what I said in my previous message. (8,4/2,30s)

(8,2/1,60s) → doesn't help (8,3/3,20s) → doesn't help So I'm thinking that just the multi-sampling fixes up the inaccuracies, and anything above that is smoothing.

I feel we should at least do: (8,1/1,60s)

sensor:
  - id: !extend ntc_source
    samples: 8

As that improves the situation but doesn't introduce any delay, and this above if you want it smooth.

I'm leaving it on the intermediate (8,4/2,30s) from the previous message for now as that seems to hit the balance of smoothness and delay for possibly most people, and still only updates the result once every 60s.

edwardtfn commented 6 months ago

It's completely smooth with this: (255,10/2,30s)

sensor:
  - id: !extend ntc_source
    samples: 255
    update_interval: 30s
    filters:
      - sliding_window_moving_average:
          window_size: 10
          send_every: 2

But, yeah, that delays response times by ~2 minutes and is a bit overkill ito resource usage.

I also tried a few other configurations to reduce the delay, but I basically settled again on what I said in my previous message. (8,4/2,30s)

(8,2/1,60s) → doesn't help (8,3/3,20s) → doesn't help So I'm thinking that just the multi-sampling fixes up the inaccuracies, and anything above that is smoothing.

I feel we should at least do: (8,1/1,60s)

sensor:
  - id: !extend ntc_source
    samples: 8

As that improves the situation but doesn't introduce any delay, and this above if you want it smooth.

I'm leaving it on the intermediate (8,4/2,30s) from the previous message for now as that seems to hit the balance of smoothness and delay for possibly most people, and still only updates the result once every 60s.

You mean 8 samples, 30s update interval, window size 4 and send every 2, right?

grigi commented 6 months ago

Yes, it was getting too much to type out so I lazily shortened it.

I'm planning to leave it like that tonight so I can post a picture comparing the different configurations.

grigi commented 6 months ago

image

grigi commented 6 months ago

I'm doing a few more experiments, trying to see what is it that makes it "smooth".

grigi commented 6 months ago

(8,9/3,20s): image

(8,12/12,5s): image

So, many samples seem quite important. Once you're at ~10 samples in a moving window, things look good.

So now the question is, how often are you prepared to sample...

grigi commented 6 months ago

(8,8/4,15s) image

8 samples doesn't really help. It needs to be >10 to be smooth.

I'm doing a (12,12/3,20s) test now overnight. It should be pretty much perfectly smooth. And have an average delay of 1½min.

edwardtfn commented 6 months ago

What is your goal? I mean, what is your threshold to consider it good?

grigi commented 6 months ago

I want the impossible. it should be smooth, but have no artificial delay added on. But now I'm just trying to optimise for as smooth as possible with acceptably small delay.

It does depend on what is considered a good enough delay. And how many samples/minute is fine for low cpu.

I kinda feel the optimal is either (8,1/1,60s) (just multisampling) or (8,9/3,20s) multisampling and smoothing of 9 samples spread over 3 minutes. (average delay of 1 minute)

I feel that more samples/minute is probably a bit too high load on the panel.

So choose between either of those two I think.

edwardtfn commented 6 months ago

I will add this sampling to the next release, so hopefully we can get feedback from more people. I'm planning to use samples: 12.

I wouldn't be so much afraid of reducing collection interval and increasing samples/average. We have critical memory limitation, but I don't think CPU would be an issue.

grigi commented 6 months ago

Thanks samples: 12 will be a notable improvement already.

andythomas commented 6 months ago

First, I would also vouch against an averaging window, because it would delay the response. Then, I tried to 'simulate' the expected results for samples: 12 and came to the following results. First, I plotted my living room temperature:

Then, I made a histogram, indicating that most values are between 20.9°C and 21.5°C, i.e. within +/-0.3°C. The Gaussian fit is depicted in red.

Finally, I 'simulated' a histogram assuming a textbook noise reduction of 1/sqrt(12):

I would say that this would suffice. I will compare that to real measurements once I get to it. However, I would expect diminishing returns at some point and would like to stay within the 0.1°C granularity range for the temperature measurements and control, i.e. I would not sample much more than 12 readings.

edwardtfn commented 6 months ago

I would also vouch against an averaging window, because it would delay the response.

We probably can compensate it by reducing the measurement interval. We are currently collecting temperature every minute, and I believe we can collect more without a significant impact on CPU, so maybe collect every 15s (or maybe even smaller interval) and doing some averaging in a way that it keeps the reporting every minute.

grigi commented 6 months ago

The multisampling doesn't smoothen out the graph, it reduces inaccuracies of the ESP32's ADC. I did do tests and there was no noticeable difference between 8 and 255 samples.

Note that these samples all happen in a tight loop, and the purpose is just to improve the ADC accuracy.

It helps a lot, as it is. But doesn't eliminate all the noise. It can filter out noise in the MHz range I suppose. Likely some other part of the thermistor system introduces noise at a lower frequency.

So ALSO doing multiple spread out sample intervals helps more. Doing a measurement every 5 seconds and then averaging over the 12 samples for each minute gave a really smooth graph. It also should introduce virtually no real delay.

andythomas commented 6 months ago

Something strange is going on when using ADC multisampling. I added

sensor:
  - id: !extend ntc_source
    samples: 8

external_components:
  - source: github://pr#6330
    components:
      - adc

to my yaml and wanted to do some data analysis. However, there was way less data available than anticipated. I looked into it and saw 'missing' data. It was there before, but not as obvious, so I did not notice. Please have a look at the following data, I added the multisampling at 8.10pm.

HA does not collect data every minute, but there are some gaps. If I use the multisampling with 8 samples, these gaps increase in frequency leading to gaps of more than 1 minute. Interestingly, when the panel was updated HA collects 'unavailable,' but the other data simply is not there...

First, could someone confirm this issue? @grigi Do you also have fewer datapoints? It would also look smoother... @edwardtfn Edward, if you could explain to me how the data is reported to HA, I will look deeper into it.

While it might be caused by the lack of datapoints, the data also looks too regular and there are very few measurements at .1 compared to .2 and .0 But this could also indicate a weak bit in the ADC.

edwardtfn commented 6 months ago

A new value is sent only when it is different than the last one sent, so this is probably the reason you see these gaps and this is probably the reason you have more gaps when sampling, as the values should be more stable and therefore will repeat more.

You can force the sensor to report all values with this:

sensor:
  - id: !extend temp_nspanel
    force_update: true
andythomas commented 6 months ago

I updated to v4.3.1 yesterday and tested some more. I added:

sensor:
  - id: !extend ntc_source
    samples: 2
  - id: !extend temp_nspanel
    force_update: true

to my bedroom panel and

sensor:
  - id: !extend temp_nspanel
    force_update: true

to my living room panel. My understanding is that this leads to 12 samples for the living room and 2 for the bedroom. However, the improvement is the same and was further the same for 4 samples as well. No samples led to 0.2°C width in the Gaussian fit and all the others lead to 0.04 [sic!]. It is not really noise causing the fluctuations, but something else. Please see the following histogram data to illustrate. Top is living room (12 samples) and bottom is bedroom (2 samples):

EDIT: uploaded correct plot

It is even more obvious in the raw data:

It rarely takes 21.1°C as a value and does not scatter around the average value, but rather only jumps up from time to time. While I can only guess the reason for this (weak bit in ADC, nonlinearity in the temperature calculation, internal warming of the chip when doing stuff, decimal cutoff in integer operations...) I am not sure anyone but me is interested in this ;)

My bottom line would be: Sampling improves the accuracy independent [sic!] of the value (2, 4 or 12). I could test

So ALSO doing multiple spread out sample intervals helps more. Doing a measurement every 5 seconds and then averaging over the 12 samples for each minute gave a really smooth graph. It also should introduce virtually no real delay.

next with the floor heating.

andythomas commented 6 months ago

I apologize, but I cannot let it go. The temperature increase is (with one exception) equidistantly showing every 15/16 minutes. I am almost convinced that

internal warming of the chip when doing stuff

is causing it. @edwardtfn Does the 15min interval ring a bell with any of the internal operations of the panel? Could also be aliasing of a different interval and the 1min temperature measurements. I would probably test

sensor:
  - id: !extend ntc_source
    update_interval: 13s
  - id: !extend temp_nspanel
    force_update: true

Unfortunately, the sun only allows reliable measurements at night...

grigi commented 6 months ago

Interesting findings. I'm in the process of packing to move, so will probably only be able to spend time on this in a few weeks again.

edwardtfn commented 6 months ago

@edwardtfn Does the 15min interval ring a bell with any of the internal operations of the panel?

We don't have any routine in the panel other than the time update every minute (which also updated the Wi-Fi icon).

But some things in the sensor are updated even when the display is sleeping, so depending on what happens on your Home Assistant entities, it could drive to some load to the panel.

And I'm sure the CPU and screen activity will affect the temperature measurement. In other discussion this was already reported.

edwardtfn commented 6 months ago

I will close this just to make my backlog easier to manage, but let's keep the discussion here. The system is currently using 12 samples, but I'm totally open to revert that back or to implement a different approach. Just don't wanna to handle specific needs on the main code. We have customizations for that. 😉

andythomas commented 6 months ago

I tried Nickolas' (@grigi) suggestion the make a moving average filter measuring several times a minute, but only reporting the value once per minute. This would address my concern, because it would be as fast as before but supposedly more accurate and, therefore, smoother. I chose 6 times, because we probably cannot overcome the occasional warm up leading to the 0.2°C spikes. I want to keep the 0.1°C accuracy and need at least 5 measurements for that (0.2/5 = 0.04 < 0.05). 6 should be on the safe side.

sensor:
  - id: !extend ntc_source
#    samples: 1
    update_interval: 10s
    filters:
    - sliding_window_moving_average:
        window_size: 6
        send_every: 6
  - id: !extend temp_nspanel
    accuracy_decimals: 2
    force_update: true

That worked perfectly. It is amazing, how accurate the measurement can be done with such a simple device. Just for the testing, I set accuracy_decimals: 2 and force_update: true to allow a better/easier data analysis. Please look at the raw data:

It is possible to track the slow temperature cooldown without heating during the night. If I correct for this decrease and only look the the scattering of the data, I get the following:

Textbook behavior only limited by the resolution :) Finally, I checked for Edward's concern about the blocking of the device.

Component adc took a long time for an operation

I get this every other temperature measurement, but I also get it with vanilla v4.3.1 so my changes did not increase that. I will finally use

sensor:
  - id: !extend ntc_source
    # samples: 12 already used in v4.3.1
    update_interval: 10s
    filters:
    - sliding_window_moving_average:
        window_size: 6
        send_every: 6

@edwardtfn It could be used at the default setting, I cannot see any disadvantage other than the possible readjustment of the heater setpoints from (e.g. in my case from cold_tolerance: "0.3" and hot_tolerance: "0.1" to values such as cold_tolerance: "0.2" and hot_tolerance: "0.1") If you want to change the main code and you think it is taking load from the panel, I could test for samples: 4 instead of samples: 12

edwardtfn commented 6 months ago

I think we are fine with this Component adc took a long time for an operation. I don't have it with v4.3.2d1 (which have no changes on the ADC compared to v4.3.1), but ESPHome is using a deprecated ADC driver and they will probably change that soon when migrating to IDF v5, so I won't look at that for now.

If you have some time to try samples: 4, I think it would be nice, as in some messages above was mentioned this has no impact to the results. Or even samples: 1 (which means no sampling as before).

andythomas commented 6 months ago

I will try to summarize my results. The following code is added to the panels' yaml:

sensor:
  - id: !extend ntc_source
    samples: A
    update_interval: B
    filters:
    - sliding_window_moving_average:
        window_size: C
        send_every: C

Width denotes the width of the Gaussian fit as depicted in previous posts.

A B C width (°C) comments
v.4.3.0 1 60s 1 0.2 starting point
v.4.3.1 12 60s 1 <=0.04 0.2°C spikes every 15min -> bad for control loops
(1) 12 10s 6 <=0.04 reference, perfect for accuracy_decimals: 1
(2) 1 10s 6 0.05 slightly worse than (1), pr#6330 not required
(3) 4 10s 6 <=0.04 as good as (1), but uses less resources
(4) 1 5s 12 <=0.04 as good as (1), slightly more resources than (3), pr#6330 not required
edwardtfn commented 6 months ago

I have a couple of side questions here:

  1. Should we start using accuracy_decimals: 2? Will that add any real value? Plots will probably look a bit better, but other than this, any value added? The cost may be on the network usage.
  2. Should we send more frequently than 1min? This is just about playing with send_every and if the value isn't changing, it won't affect the network usage as it isn't sent, but it will be much faster to react to a window opened or some other event.

Any thoughts?

grigi commented 6 months ago

No, accuracy of 0.1C is perfectly good. I think that was used as a way to accurately calculate the deviation on his system.

On Sat, 23 Mar 2024, 16:34 Edward Firmo, @.***> wrote:

I have a couple of side questions here:

  1. Should we start using accuracy_decimals: 2? Will that add any real value? Plots will probably look a bit better, but other than this, any value added? The cost may be on the network usage.
  2. Should we send more frequently than 1min? This is just about playing with send_every and if the value isn't changing, it won't affect the network usage as it isn't sent, but it will be much faster to react to a window opened or some other event.

Any thoughts?

— Reply to this email directly, view it on GitHub https://github.com/Blackymas/NSPanel_HA_Blueprint/issues/1918#issuecomment-2016541380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7T2AMEUV4UK5H2RDZBRLYZWVKHAVCNFSM6AAAAABEXSNH3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWGU2DCMZYGA . You are receiving this because you were mentioned.Message ID: @.***>

andythomas commented 6 months ago
  1. I do not think there is added value using accuracy_decimals: 2. As Nickolas already pointed out, I only did it for the data analysis, i.e. to be able to plot the histograms and I would switch to accuracy_decimals: 1 during normal use. I do not think we can really measure temperature differences and certainly not temperature with that accuracy using the panel's thermistor. Two decimals would suggest a significance that is not there. In addition, I believe the HA thermostat card always uses 1 decimal anyway. Finally, accuracy_decimals: 2 would require the Gaussian width to be <=0.004 and much more averaging.
  2. This is up to you Edward :D I cannot estimate the acceptable load on the panel's ESP32. However, for the same measurement accuracy we would have to change update_interval: B and keep window_size: C and send_every: C the same.
edwardtfn commented 6 months ago

However, for the same measurement accuracy we would have to change update_interval: B and keep window_size: C and send_every: C the same.

Are you sure about that? It shoukd be able to do averaging and still having a send_every < window_size.

andythomas commented 6 months ago

hmm, I am not sure for this use case. I wanted the response to a step function also be a step function. But now thinking about it, it does not make much sense for a temperature measurement. If T was 20°C and 19.8°C shortly after, it was certainly 19.9°C in between these measurements.

andythomas commented 6 months ago

I updated the table to its final (?!) form. I think (3) is the best solution and the most robust against, e.g., variations of the ADC quality between panels. If the samples do not make it into ESPHome, (4) is as good but probably a little bit less robust and consumes slightly more resources. I will advertise the PR and share the data here. And I do think send_every can pretty much take any value deemed useful in line with @edwardtfn comment and as opposed to what I said before.

andythomas commented 6 months ago

In a quick test, I compared for 2h each

sensor:
  - id: !extend ntc_source
    samples: 4
    update_interval: 10s
    filters:
    - sliding_window_moving_average:
        window_size: 6
        send_every: 6

with the same but send_every: 1 and both without force_update: true. The 1 led to only 10% more values being reported, so it might be worthwhile to be

much faster to react to a window opened