AdyRock / com.switchbot

GNU General Public License v3.0
10 stars 6 forks source link

Hub sensor regularly doesn't update #127

Open rudolfterp opened 3 months ago

rudolfterp commented 3 months ago

My Hub 2 sensor (temp, humidity, light) regularly has old values in Homey compared to Switchbot own app.

This was yesterday evening:

Homey: Screenshot_20240711-230738c

Switchbot: Screenshot_20240711-231233

As you can see, there were several changes in temp and humidity in the last 4 hours (after the vertical line), but these values weren't send to Homey.

The last update (4 hours ago) was via webhook: (My timezone is UTC+2)

* 2024-07-11T17:02:33.501Z
* Got a webhook message! {
  "eventType": "changeReport",
  "eventVersion": "1",
  "context": {
    "deviceType": "WoHub2",
    "deviceMac": "FAAD9871D28C",
    "temperature": 23.3,
    "humidity": 60,
    "lightLevel": 2,
    "scale": "CELSIUS",
    "timeOfSample": 1720717352730
  }
}

Seems to me there were no updates via the regular API-way?

When I use 'Get status' I get the correct values (different from the ones shown on the Homey device): Schermafbeelding 2024-07-11 231450

Only after I restarted the Switchbot Homey app, the correct values were finally updated to the Hub 2 Homey device.

I have 4 Switchbot devices:

Apart from the updating problem, everything works fine. I use the hubs to control two IR-devices, via homey flows. This works perfectly.

The update problem seems to be limited to Hub 2, but i'm not entirely sure: the other sensors are in places where temp and/or humidity change a lot more. These devices get a lot of webhook-updates, like several per hour.

It seems that webhook updates are only sent for medium and large changes, and the Hub 2 sensor is in a very temp/humidity stable area. But these small changes over time (hours) is exactly what's important. I have some flows that notify me when certain threshold values are reached. But these are not working properly now.

rudolfterp commented 3 months ago

Another thing I noticed was the log after restarting yesterday:

* 2024-07-11T21:17:51.119Z
* ****** App has initialised. ******

* 2024-07-11T21:17:51.241Z
* server listening 0.0.0.0:1234

* 2024-07-11T21:17:51.747Z
* server got: Are you there SwitchBot? from 192.168.***

* 2024-07-11T21:17:52.214Z
* SwitchBot webhook already registered

* 2024-07-11T21:17:52.325Z
* Polling hub: 21 API calls today

* 2024-07-11T21:17:54.697Z
* Homey Webhook registered for devices {
  "$keys": [
    "FA87251C841D",
    "CCEE0545DD8F",
    "FAAD9871D28C"
  ]
}

It says '21 API calls today' (yesterday whole day). Seems pretty low, but I don't know what is counted. 21 seems to be the number of requests sent from my own flows to control the IR-devices.

AdyRock commented 3 months ago

Most updates come via a webhook, so the app only polls devices that don't use them or have some capabilities that are not updated via a webhook.

Occasionally, the webhook system does run slowly, but I'm not sure if it's Athom or SwitchBot. My bet is SwitchBot.

rudolfterp commented 3 months ago

Thx for the reply. I didn't know webhook enabled devices were only updated through webhooks, I thought is was a combination (regular update via API, larger changes via webhooks).

Is there no way to also update via the API? I know the calls are limited, but in this case there are a ton of calls left. You can think of something like an API update when there has been no webhook update for some time, like 15 minutes. Even half an hour will be a big improvement in case the webhooks fail. Or let the user decide to enable API update calls for certain (or all) devices (for example default off, the accommodate users with many non-webhook devices)

By the way, during those 4 hours without webhook updates for Hub 2, other devices were still updated very regularly.

This error situation occurs almost every day. Even when its Switchbots fault, I highly doubt they see it as an issue, because the API still works in this case.

Is there anything I can do in my situation? A workaround or something?

AdyRock commented 3 months ago

The webhooks should (and normally do) publish updates as soon as the change happens, but the key is only when a change occurs. It's possible the event only occurs if the temperature changes by 1 degree or humidity changes by 1%. That reduces the load on both Homey and the servers. Polling is very expensive in terms of processing power, which is why SwitchBot limit the number of API calls.

RASK18 commented 2 months ago

I have exactly the same problem with My Hub 2!! I understand that the bug could be from Athom or SwitchBot, but it would be great to have a workaround while they fix it Updating data via API is not efficient but seems to be the most reliable, so I think there should be a optional option to use it

I propose to have a 'then-card' that allows updating the status manually in a flow: https://github.com/AdyRock/com.switchbot/issues/134 At least for Hub2, which is the one that is giving us problems

AdyRock commented 2 months ago

There is a limit on the number of API calls and polling can easily cause your account to be locked. Some devices have to be polled as the information is not returned via the Web hook. So, even though you might not have any of those, polling has to be managed for those that do. Throwing in the ability to poll devices that shouldn't need it causes a big complication with that management. I'm not saying it's impossible, but I have to ensure the app doesn't break for others.

RASK18 commented 2 months ago

And what alternative do you offer? Because our Hub2 are not updating at the moment... For example, today I closed the shutters completely because it was raining and the light sensor kept marking 70% for 30 minutes, and of course, my flows didn't run and the lights didn't turn on during that time, I was in the dark for 30 minutes hoping that it would work soon 😓

I don't see much of a problem with allowing users to manage it manually if they want, in the end those who have it working won't use it, and with a name like "not recommended", "for developers only", "might block your account"... you scare away the basic users. Only those who already have problems will use it, like us

I have tried to program a solution with HomeyScript but there is no way because it does not have the crypto js library for request auth header 😞

PD: What do you think @rudolfterp ?

rudolfterp commented 2 months ago

I agree that something could be done at the Homey side, but I also understand that Adrian can't simply implement a feature that has a chance of making it worse. But maybe we can think of something.

But I also filed this issue with switchbot, and this was their response:

Thank you for bringing this to our attention. We have recently observed a continuous increase in the load on our open API service, with some users exhibiting unusually high-frequency calls. This has affected the stability of our service.

Please rest assured that we are aware of the issue and have already planned a solution. We expect to have a fix in place by the end of this quarter.

Even though it looks like they have a solution here (at the end of September?), I'm not completely confident that this will completely fix the problem, or that the problem won't come back in the future. So i am still looking for a reliable way to fix this problem.

In the meantime I made an automated flow to check if the updates differs from the API:

I have 3 sensors:

Result: The flow logged errors 2-4 times a day with only Hub 2 and the outdoor sensor. Both about equally divided. (this flow certainly doesn't capture all errors because of the interval, so the actual error-count could be much higher)

So not only Hub 2 is affected, but the (older) indoor sensor doesn't seem to be affected at all.

If it is only a performance problem, I would think all devices are affected. But it could also be the result of some (attempt of) load-balancing or a certain fixed order in which webhook-messages are created for different devices.

One side note: I use the outdoor sensor in the bath-room to control a fan when someone is taking a shower, and it never fails to switch on the fan within a minute. Within this minute there are multiple updates in de Switch-bot app (up to 10), because of the rapid change in humidity and temp. So in practice there is always at least 1 webhook message coming through. The core of the problem is probably that a certain percentage of webhook updates are not coming through.

Edit: I'm aware that my current flow to detect update errors can potentially get false positives, because the values could have changed in the meantime and back to the old value. But I compared the majority of errors with the values of the Switchbot app, and they were indeed all errors: the value in Homey was in all cases at least 15 minutes behind Switchbot (and in multiple occasions I saw multiple updates in Switchbot, and no updates at all in Homey.)

I'm sure there are better flow-desings to think of, but this one is relatively simple and sufficient. My goal is not to detect all errors, but to follow whether the errors are increasing or decreasing.

AdyRock commented 2 months ago

Abe from Athom has just put me in contact with the Director of Products at SwitchBot as my previous contact has left. Hopefully this will result in a better integration and more support. I have been considering a possible solution by monitoring the time since the last update and polling if it has been too long. I just need to work out how long that should be based on other devices that also require polling.