astrandb / miele

A modern integration for Miele devices in Home Assistant
MIT License
204 stars 30 forks source link

Absolute time sensors unstable #107

Closed mundschenk-at closed 2 years ago

mundschenk-at commented 2 years ago

The problem

The the value calculation for recently added absolute time sensors (started_at and finish_at) is unstable and oscillates by a minute, spamming the logbook.

Version of miele integration having the issue?

miele-0.0.22

Version of Home Assistant Core having the issue?

core-2022.6.2

Anything in the logs that might be useful for us?

No response

Additional information

No response

mundschenk-at commented 2 years ago

Here's an example for the stateElapsedTimeAbs sensor:

image

And another one for stateRemainingTimeAbs:

image

Is there any way to make them behave less erratically, @astrandb? Especially the stateElapsedTimeAbs sensor right now is pretty useless (because unlike the finish_at sensor should not be able to change at all, once a program has been started. I realize this is due to the elapsed time not being consistently updated in the API, but I don't think the current implementation is useful.

Maybe some kind of average function or simply ignoring differences less than twice the resolution (i.e. 120 seconds) from the previous value? That wouldn't help for the stateElapsedTimeAbs but maybe that's not really salvageable with the API.

astrandb commented 2 years ago

It should be possible to "debounce" stateRemainingTimeAbs one way or another. I am more concerned about stateElapsedTimeAbs that seems to be climbing by one minute for each minute. It is impossible for me to trace what is going on without access to a device that reports these values. Can to try to catch the values from Elapsed time and Started at and see how the vary.

mundschenk-at commented 2 years ago

It should be possible to "debounce" stateRemainingTimeAbs one way or another. I am more concerned about stateElapsedTimeAbs that seems to be climbing by one minute for each minute. It is impossible for me to trace what is going on without access to a device that reports these values. Can to try to catch the values from Elapsed time and Started at and see how the vary.

I've added the entities to my logging automation. Sometimes the start time oscillates like the finish time, sometimes it increases steadily (which might well be an API bug).

mundschenk-at commented 2 years ago

It should be possible to "debounce" stateRemainingTimeAbs one way or another. I am more concerned about stateElapsedTimeAbs that seems to be climbing by one minute for each minute. It is impossible for me to trace what is going on without access to a device that reports these values. Can to try to catch the values from Elapsed time and Started at and see how the vary.

So here's "Started at":

image

Here are some snippets from the diagnostics around those oscillations (from the last three times in the screenshot, though I'm not quite sure about the first one, since this is a manual process, it might have been a bit earlier):

      "state|elapsedTime|0": 3,
      "state|elapsedTime|1": 10,
      "state|elapsedTime|0": 3,
      "state|elapsedTime|1": 12,
      "state|elapsedTime|0": 3,
      "state|elapsedTime|1": 13,

I think mostly this is an issue of the clocks not being perfectly synchronized (and also the API might sometimes not be updated in time).

astrandb commented 2 years ago

Thank you.

I agree that it can be unsynced clocks in API and HA respectively that causes this jitter. The reported elapsed time should be monotonous increasing, but you never know what is happening behind the scenes in the API. The remaining time could vary both up and down during normal operation, this happens all the time in my machines from Siemens.

One suggestion is to skip updating Started at and Finish at if the new calculated value is 1 minute earlier than the previously stored value.

What do you think?

mundschenk-at commented 2 years ago

Sounds good!

mundschenk-at commented 2 years ago

I am more concerned about stateElapsedTimeAbs that seems to be climbing by one minute for each minute.

I now know when and why this is happening: Once the device status reaches finished (but not off), the elapsed time (correctly) stops increasing. This (incorrectly) results in stateElapsedTimeAbs going up every minute.

astrandb commented 2 years ago

Good catch!

mundschenk-at commented 2 years ago

You are probably busy with life (& holidays), but is there something I can do to help fix this? My python's is probably to rusty for a straight-up PR, but if you point me in the right direction, I might come up with something.

astrandb commented 2 years ago

Yes, I have been spending my spare time on other things than sofware development the last couple of weeks. But now I am back at the keyboard... I don't think there is a need for a Python expert to fix this. It is more of finding out a logic when to update the absolute time sensors and when to leave them untouched. I will have a look.

astrandb commented 2 years ago

I am digging into this issue now. I have started with remaining time/Finish at where -1 minute changes are filtered out. I cannot test it properly without a real device so I need your help. I expect a number of iterations until we are happy. It would be easiest for me if you can pick up sensor.py from the TimeJitter branch and just overwrite your current version in custom_components/miele. I can also prepare a branch release but it would clutter my parallell work on the vacuum cleaner support and it is a bit cumbersome for the quick iterations.

mundschenk-at commented 2 years ago

Sure, I can do that. However, I'll be away for a few days, so if I should quickly test something, it would be best to do it today. Is the branch ready for the first iteration?

astrandb commented 2 years ago

Yes, the fix for Finish at is there.

astrandb commented 2 years ago

On a second thought, I may have chosen wrong approach in this first attempt. From your description it is the absolute time that jitters. Is the remaining time stable in the sense that it is monotonously decreasing?

mundschenk-at commented 2 years ago

Yeah, that didn't work right:

image

remaining time seems to be stable, yes. The curve is not totally smooth, but there don't appear to be any increases.

image
astrandb commented 2 years ago

Thank you. It is interesting to see that the Finish time changes with 2 minutes in 4 seconds. There might be an internal race condition involved. I will prepare a new version with another approach and extended logging of the internals. It will be ready for you when you are back.

mundschenk-at commented 2 years ago

Do you have a new version for me to test?

astrandb commented 2 years ago

Yes, I reverted the changes and added an extended debug log that I hope will shed some light of what is happening. Please fetch sensor.py, set logger to debug and run a program with your dishwasher and have a look at the log.

mundschenk-at commented 2 years ago

@astrandb Unfortunately, the logging does not seem to be working. I see the sensor intialization (I added the key to be able to see if my file was active), but not the timer logging.

mundschenk-at commented 2 years ago

Hi @astrandb Could you please look at that logging function? It's not recording anything for me.

astrandb commented 2 years ago

I am sorry that I have been away from my dev keyboard for a while. I will look into this issue tonight or tomorrow.

astrandb commented 2 years ago

The logging works as intended in my test system. Lines like these appear in the full log:

2022-09-19 10:43:11.441 DEBUG (MainThread) [custom_components.miele.sensor] Key:  stateRemainingTimeAbs | Dev: 1223007 | Mins: 15 | Now: 2022-09-19 12:43:11.441109+02:00 | State: 12:58
2022-09-19 10:43:11.441 DEBUG (MainThread) [custom_components.miele.sensor] Key:  stateElapsedTimeAbs | Dev: 1223007 | Mins: 59 | Now: 2022-09-19 12:43:11.441399+02:00 | State: 13:42

Please double check:

mundschenk-at commented 2 years ago

The logging works as intended in my test system.

Weird. They did not for me (only the sensor initialization was logged, even though I was definitely using the modified sensor.py.

Please double check:

  • Enable debug logging for the miele component in configuration.yaml
logger:
    default: info
    logs:
        ...
        custom_components.miele: debug
  • Download sensor.py from branch TimeJitter at github
  • Overwrite sensor.py in custom_components/miele/

I did that again just now.

  • Restart HA

I did that as well.

  • Run your dishwasher

I'll start it after HA has finished it's restart.

Edit: Huh, it's now working. Not sure what the problem was two weeks ago. The earlier HA version is the only thing that comes to my mind.

mundschenk-at commented 2 years ago

Log for a very short rinsing run:

2022-09-19 21:06:13.583 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 7 | Now: 2022-09-19 21:06:13.582978+02:00 | State: 21:13
2022-09-19 21:06:13.588 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 7 | Now: 2022-09-19 21:06:13.588809+02:00 | State: 21:13
2022-09-19 21:06:14.977 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 7 | Now: 2022-09-19 21:06:14.977722+02:00 | State: 21:13
2022-09-19 21:06:14.986 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 7 | Now: 2022-09-19 21:06:14.986182+02:00 | State: 21:13
2022-09-19 21:07:14.163 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 7 | Now: 2022-09-19 21:07:14.163322+02:00 | State: 21:14
2022-09-19 21:07:21.600 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 7 | Now: 2022-09-19 21:07:21.600309+02:00 | State: 21:14
2022-09-19 21:07:21.610 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 7 | Now: 2022-09-19 21:07:21.610089+02:00 | State: 21:14
2022-09-19 21:07:42.674 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 6 | Now: 2022-09-19 21:07:42.674749+02:00 | State: 21:13
2022-09-19 21:07:42.680 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 6 | Now: 2022-09-19 21:07:42.680816+02:00 | State: 21:13
2022-09-19 21:08:21.592 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 6 | Now: 2022-09-19 21:08:21.592149+02:00 | State: 21:14
2022-09-19 21:08:21.618 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 6 | Now: 2022-09-19 21:08:21.618733+02:00 | State: 21:14
2022-09-19 21:09:21.152 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 6 | Now: 2022-09-19 21:09:21.152526+02:00 | State: 21:15
2022-09-19 21:09:21.605 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 6 | Now: 2022-09-19 21:09:21.605776+02:00 | State: 21:15
2022-09-19 21:09:21.614 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 6 | Now: 2022-09-19 21:09:21.613992+02:00 | State: 21:15
2022-09-19 21:09:50.089 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 5 | Now: 2022-09-19 21:09:50.088917+02:00 | State: 21:14
2022-09-19 21:09:50.101 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 5 | Now: 2022-09-19 21:09:50.100943+02:00 | State: 21:14
2022-09-19 21:10:21.597 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 5 | Now: 2022-09-19 21:10:21.596805+02:00 | State: 21:15
2022-09-19 21:10:21.607 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 5 | Now: 2022-09-19 21:10:21.607450+02:00 | State: 21:15
2022-09-19 21:10:47.217 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 4 | Now: 2022-09-19 21:10:47.217273+02:00 | State: 21:14
2022-09-19 21:10:47.226 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 4 | Now: 2022-09-19 21:10:47.226280+02:00 | State: 21:14
2022-09-19 21:11:21.589 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 4 | Now: 2022-09-19 21:11:21.589648+02:00 | State: 21:15
2022-09-19 21:11:21.611 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 4 | Now: 2022-09-19 21:11:21.611861+02:00 | State: 21:15
2022-09-19 21:11:47.291 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 3 | Now: 2022-09-19 21:11:47.291176+02:00 | State: 21:14
2022-09-19 21:11:47.302 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 3 | Now: 2022-09-19 21:11:47.302062+02:00 | State: 21:14
2022-09-19 21:12:21.581 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 3 | Now: 2022-09-19 21:12:21.580867+02:00 | State: 21:15
2022-09-19 21:12:21.590 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 3 | Now: 2022-09-19 21:12:21.590762+02:00 | State: 21:15
2022-09-19 21:13:21.240 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 3 | Now: 2022-09-19 21:13:21.240368+02:00 | State: 21:16
2022-09-19 21:13:23.567 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 3 | Now: 2022-09-19 21:13:23.567091+02:00 | State: 21:16
2022-09-19 21:13:23.575 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 3 | Now: 2022-09-19 21:13:23.575335+02:00 | State: 21:16
2022-09-19 21:13:54.756 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 2 | Now: 2022-09-19 21:13:54.756729+02:00 | State: 21:15
2022-09-19 21:13:54.760 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 2 | Now: 2022-09-19 21:13:54.760098+02:00 | State: 21:15
2022-09-19 21:14:21.588 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 2 | Now: 2022-09-19 21:14:21.587968+02:00 | State: 21:16
2022-09-19 21:14:21.597 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 2 | Now: 2022-09-19 21:14:21.596881+02:00 | State: 21:16
2022-09-19 21:14:53.980 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 1 | Now: 2022-09-19 21:14:53.980422+02:00 | State: 21:15
2022-09-19 21:14:53.991 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 1 | Now: 2022-09-19 21:14:53.991869+02:00 | State: 21:15
2022-09-19 21:15:21.606 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 1 | Now: 2022-09-19 21:15:21.606022+02:00 | State: 21:16
2022-09-19 21:15:21.616 DEBUG (MainThread) [custom_components.miele.sensor] Key: stateRemainingTimeAbs | Dev: <deviceid> | Mins: 1 | Now: 2022-09-19 21:15:21.616080+02:00 | State: 21:16

The dishwasher hanged to finished at 21:15:54 according to the HA logbook.

astrandb commented 2 years ago

Thank you. This is most helpful with regard to the root cause of the apparent instability of the abs time sensor. The API report of time left is not decreasing by 1 minute by every 60 s.

Minutes left     Duration
7 min             68 s 
6 min            128 s
5 min             31 s
4 min             60 s
3 min            127 s
2 min             59 s

So there is really nothing wrong with the code. It is no surprise that the estimated time left can go up and down during a wash cycle. But it is somewhat annoying to get a log entry everytime the time changes by one minute. I am open to suggestions how to "debounce" the sensor to reduce the spamming of logs.

mundschenk-at commented 2 years ago

Not sure about actual debouncing, but I believe there is the concept of "significant change" in HA logbook handling? Maybe set that to 5 or even 10 minutes?

astrandb commented 2 years ago

I looked at significant_change some time ago but I could not find any examples of practical use in integrations. To my knowledge this feature is only used internally in HA components such as light, sensor etc. I will give it a try and hope that an added async_check_significant_change does not interfere with the internal funcions.

astrandb commented 2 years ago

I have reached out to the core developer chat on Discord and there seems not to be any built-in feature that supports our use case. However, I have found a workaround that excludes these state changes from the logbook. If the absolute time sensors are assigned a unit_of_measurement (e.g. an empty string instead of None) they are treated as rapidly changing numeric values and thus excluded from logbook. I have pushed this fix to branch TimeJitter for you to evaluate.

The issue below remains to be fixed. It is not spamming the log anymore but the timestamp should not be updated when the program is finished.

I am more concerned about stateElapsedTimeAbs that seems to be climbing by one minute for each minute.

I now know when and why this is happening: Once the device status reaches finished (but not off), the elapsed time (correctly) stops increasing. This (incorrectly) results in stateElapsedTimeAbs going up every minute.

mundschenk-at commented 2 years ago

Thank you, I'll try out the new version with tonight's dishwasher run.

mundschenk-at commented 2 years ago

The workaround is looking good for me. As for stateElapsedTimeAbs, I'd suggest stopping increasing stateElapsedTimeAbs once the device status changes to finished. I can try to create a PR on the weekend, but the usual caveats apply.

astrandb commented 2 years ago

Time to try again... I have now prepared a new sensor.py in TimeJitter branch. It stops updating the Started at when status is program_ended (7) Please test and report back. I hope we can close this issue soon.

mundschenk-at commented 2 years ago

There seems to be an issue with the sensor.py implementation:

Logger: homeassistant.components.sensor
Source: custom_components/miele/sensor.py:736 
Integration: Sensor (documentation, issues) 
First occurred: 17:11:32 (4 occurrences) 
Last logged: 18:04:51

Error adding entities for domain sensor with platform miele
Error while setting up miele platform for sensor
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 428, in async_add_entities
    await asyncio.gather(*tasks)
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 691, in _async_add_entity
    await entity.add_to_platform_finish()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 777, in add_to_platform_finish
    self.async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 533, in async_write_ha_state
    self._async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 571, in _async_write_ha_state
    state = self._stringify_state(available)
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 539, in _stringify_state
    if (state := self.state) is None:
  File "/usr/src/homeassistant/homeassistant/components/sensor/__init__.py", line 404, in state
    value = self.native_value
  File "/config/custom_components/miele/sensor.py", line 736, in native_value
    if self.coordinator.data[self._ent]["stateStatus"] == 7:
KeyError: 'stateStatus'
astrandb commented 2 years ago

Typical problem when I test my code with my fake data. I will have a look again tomorrow night or on Monday.

astrandb commented 2 years ago

Corrected the key on line 736. Please test again.

mundschenk-at commented 2 years ago

Seems to work fine with this. Since they now don't appear in the logbook, I could not really check if the sensor stopped updating at the right time (I mean I could, but I missed the window this time). I think it did, but the History view for the absolute time sensors is hard to read (it tracks some low number that does not appear to be a timestamp and certainly not a wall clock time).

astrandb commented 2 years ago

At last… 😃 The sensor should stop on the next data update after that status changes to program_ended. I think it is as good as it gets. You can end up with Unknown if HA is restarted when status id program_ended. I think we have to live with that.

ArnoutVos commented 1 year ago

Hi, this is still a problem with the latest integration version, home assistant core 2023.4.6, Miele WSI 863

Screenshot_20230428-104142_Home_Assistant