Pigotka / ha-cc-jablotron-cloud

HACS custom component for jablotron cloud integration
GNU General Public License v3.0
12 stars 2 forks source link

Alarm entities often unavailable, need to reload integration to recover #15

Open ra-kal opened 10 months ago

ra-kal commented 10 months ago

Hi,

the Jablotron integration does not show the alarm state very often. Sometimes it recovers, but sometimes it keeps disconnected. If the integration is reloaded, the alarm entities shows immediatelly current state again.

Logger: custom_components.jablotron_cloud
Source: helpers/update_coordinator.py:296
Integration: Jablotron Cloud ([documentation](https://github.com/Pigotka/ha-cc-jablotron-cloud), [issues](https://github.com/Pigotka/ha-cc-jablotron-cloud/issues))
First occurred: 15. listopadu 2023 v 14:49:27 (56 occurrences)
Last logged: 08:26:27

Timeout fetching Jablotron data
Logger: custom_components.jablotron_cloud.alarm_control_panel
Source: custom_components/jablotron_cloud/alarm_control_panel.py:205
Integration: Jablotron Cloud ([documentation](https://github.com/Pigotka/ha-cc-jablotron-cloud), [issues](https://github.com/Pigotka/ha-cc-jablotron-cloud/issues))
First occurred: 04:45:37 (118 occurrences)
Last logged: 08:14:31

States data not found
Logger: custom_components.jablotron_cloud
Source: custom_components/jablotron_cloud/__init__.py:103
Integration: Jablotron Cloud ([documentation](https://github.com/Pigotka/ha-cc-jablotron-cloud), [issues](https://github.com/Pigotka/ha-cc-jablotron-cloud/issues))
First occurred: 04:40:27 (2 occurrences)
Last logged: 04:40:58

Failed to get services!
Pigotka commented 10 months ago

@ra-kal hi can you confirm the issue still persists in last few days?

zoezoevp commented 10 months ago

@ra-kal hi can you confirm the issue still persists in last few days?

It still persists from my side. Entities become unavailable. I have written an automation to reload the integration if they don't recover automatically.

Pigotka commented 10 months ago

Do you have latest release 0.5.4?

zoezoevp commented 10 months ago

Yes, 0.5.4 is installed.


De : Pigotka @.> Envoyé : lundi 20 novembre 2023 14:00 À : Pigotka/ha-cc-jablotron-cloud @.> Cc : zoezoevp @.>; Comment @.> Objet : Re: [Pigotka/ha-cc-jablotron-cloud] Alarm entities often unavailable, need to reload integration to recover (Issue #15)

Do you have latest release 0.5.4?

— Reply to this email directly, view it on GitHubhttps://github.com/Pigotka/ha-cc-jablotron-cloud/issues/15#issuecomment-1819021554, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQYS3NXRJO35PBCFP6SXYU3YFNH6XAVCNFSM6AAAAAA7NW6IMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJZGAZDCNJVGQ. You are receiving this because you commented.Message ID: @.***>

ra-kal commented 10 months ago

@ra-kal hi can you confirm the issue still persists in last few days?

I've not needed to reload the integration since the latest version. But the alarm keeps showing unavailable usually multiple times a day. It would be nice if the connection to the Jablotron server would be much more stable.

Pigotka commented 10 months ago

Keeping this one open. I will try to replicate what HA integration reload does automatically when entities becomes unavailable. But a bit later when I find some free time.

jimsaw1 commented 9 months ago

@Pigotka Thank you very much for the integration, it's great.

Unfortunately, I have the same issue with unavailable entities very often. It works after reloading the integration. Version: 0.5.4. It would be great if you could investigate it.

I used @zoezoevp idea to reload the integration automatically via automation. Thank you for this. But there is still some downtime when reloading, approximately 30 seconds. This is a problem because it has automation that arms and disarms based on phone location. Any delay can cause me to turn the alarm on when I get home.

Koky05 commented 9 months ago

I had same problem with other integration so I modify HA automation to check status of current entity then if it is needed reload integration and wait check cycle until i got requested status and change status at the end.

But yes I got 21-times unavailable status just from yesterday, mostly at morning from 4am to 10am.

Pigotka commented 5 months ago

Alright finally start solving this issue. Have a new version deployer locally and will see how it holds over the weekend.

Pigotka commented 4 months ago

Unfortunately, the integration still needs to be reloaded because the underlying package gets stuck. I've messaged @fdegier to see if we can resolve it somehow. WIP

fdegier commented 4 months ago

Unfortunately, the integration still needs to be reloaded because the underlying package gets stuck. I've messaged @fdegier to see if we can resolve it somehow. WIP

Which message? Can you describe the issue?

Pigotka commented 4 months ago

I'm sorry guys but the testing is not going well and we recent version still have some major issues that needs to be fixed before next release. Stay tuned.

iFlyinq commented 4 months ago

I'm sorry guys but the testing is not going well and we recent version still have some major issues that needs to be fixed before next release. Stay tuned.

No problem at all! We appreciate all your help, don't blame anyone and for sure not yourself ;)

fdegier commented 3 months ago

@Pigotka can you upgrade to https://github.com/fdegier/JablotronPy/releases/tag/0.6.2 as this adds a retry mechanism which should circumvent the 408 error, it hopefully also logs the error when it persists.

RGB666 commented 3 months ago

I had same problem with other integration so I modify HA automation to check status of current entity then if it is needed reload integration and wait check cycle until i got requested status and change status at the end.

But yes I got 21-times unavailable status just from yesterday, mostly at morning from 4am to 10am.

How do you check and reload the Jablotron integration?

Pigotka commented 3 months ago

How do you check and reload the Jablotron integration?

@RGB666 pls don't we have fix for it already in works, will be out this week.

Koky05 commented 3 months ago

@RGB666 I have automation like these, but you need to found configuration entry_id from your setup:

alias: Jablotron refresh
description: ""
trigger:
  - platform: state
    entity_id:
      - alarm_control_panel.koval_peter_rd_nocny_rezim
    to: unavailable
condition: []
action:
  - service: homeassistant.reload_config_entry
    data:
      entry_id: 9618223e5f45e997e7c0d50601023666
  - delay:
      hours: 0
      minutes: 0
      seconds: 30
      milliseconds: 0
mode: single

Just for today I have two (17.6.2024 8:51) unavailable statuses.

Pigotka commented 3 months ago

New release addressing the problem. My integration has run uninterrupted for a week. Please test it and report back whether we can close this issue.

Pigotka commented 3 months ago

Btw @fdegier your changes from 0.6.2 won't matter I believe as my integration was retrying requests many times but once it fails once it will keep failing. I believe the issue is in the session being invalidated and you have to recreate it and relogin.

Koky05 commented 3 months ago

@Pigotka I use same code as was published in last release (hard copy from pull requests) and have 2 - 4 times per day unavailable status.

Pigotka commented 3 months ago

@Koky05 Right, those are the places when intergration would stop but now it will recover at the next update cycle, usually in 30s. Eventually, @fdegier will fix even that small downtime ;)

Koky05 commented 3 months ago

Not always, even my automation did not succeed: image

fdegier commented 3 months ago

@Koky05 Right, those are the places when intergration would stop but now it will recover at the next update cycle, usually in 30s. Eventually, @fdegier will fix even that small downtime ;)

I welcome contributions ;)

To set your expectations:

  1. I don't use either this plugin nor PyJablotron as I use Homebridge, which can run weeks without encountering the same issue. We do however run into it and I am trying to debug why this happens.
  2. Jablotron does not have a public API, all of the work is done based on an internal API they use for the app
  3. Jablotron API is flaky, it just gives random errors such as method not supported on something we use all the time without any issue

Right now we only set the cookies based on a 401 error, I think we would need to extend that in order to prevent this issue from occurring. https://github.com/fdegier/JablotronPy/blob/255fd4d7fbadbf46da4dc919394cbb393330dc9d/jablotronpy/jablotronpy.py#L68

avano commented 3 months ago

@Pigotka with the latest version I only get:

2024-06-17 12:12:37.407 DEBUG (MainThread) [custom_components.jablotron_cloud] Preparing Jablotron data update coordinator
2024-06-17 12:12:39.433 DEBUG (MainThread) [custom_components.jablotron_cloud] Failed to get gates data for service <service id>
2024-06-17 12:12:39.433 DEBUG (MainThread) [custom_components.jablotron_cloud] Finished fetching Jablotron Cloud data in 2.025 seconds (success: False)

It fails on this line

calling the get_programmable_gates manually using jablotronpy library, it returns {'service-states': {'last-event-time': '2024-06-17T12:08:29+0200', 'service-name': '<name>'}, 'states': []} (with adding print(data) there) and raises the exception here

I don't know what the programmable gate is in the context of jablotron, but in the jablotron UI I only have 1 section and 1 keypad - previously I used the section to arm/disarm the alarm using this integration

Pigotka commented 3 months ago

@avano hmm indeed it is something I did in the latest release but I do it because I get an UnexpectedResponse exception from JablotronPy. @fdegier do you think it is correct to throw when the response is correct but the user doesn't have any gates? Anyway pls report it as extra issue as it is not part of this bug.

fdegier commented 3 months ago

@avano hmm indeed it is something I did in the latest release but I do it because I get an UnexpectedResponse exception from JablotronPy. @fdegier do you think it is correct to throw when the response is correct but the user doesn't have any gates? Anyway pls report it as extra issue as it is not part of this bug.

The design principle was to either return data or an UnexpectedResponse as we are returning the entire data from Jablotron instead of parsing the PG's

We could change it so it returns only a list of programmableGates or an empty list. But IIRC you also needed the states in order to map the cloud component ID?

fdegier commented 3 months ago

Version 0.6.3 fixes the bug with re-setting cookies.

RGB666 commented 3 months ago

How do you check and reload the Jablotron integration?

@RGB666 pls don't we have fix for it already in works, will be out this week.

What is the expected release date?

Pigotka commented 3 months ago

It's already included in 0.6.2 release. But we now have 2 version of safety net and I think only mine is being used. I can try to disable it locally to see.

RGB666 commented 3 months ago

Ok

RGB666 commented 3 months ago

still many reloads per day needed

Pigotka commented 3 months ago

@RGB666 can you elaborate? When and how quickly do you reload? Integration now recovers automatically so there should be no reason to reload anymore.

RGB666 commented 3 months ago

When I am at home, the alarm is automatically disarmed at 6 o clock in the morning (scheduled in the Jablotron calendar, calendar rule is blocked by a PG when I am away).
I have made an automation: 30 seconds after 6 o'clock HA checks if the disarmed. If the alarm is not disarmed, HA reboots.

RGB666 commented 3 months ago

002193 002194

002195

Pigotka commented 3 months ago

@RGB666 oh, from all this I'm not sure how it can even be related with Jablotron. 30s is not enough time to get new alarm state. Integration is pooling with 30s interval and 120s timeout so you should check more like after 2min. In normal situation, it will be almost instant. But even if it is not disarmed I see no reason for restarting HA this is crazy. HA should never restart.

You can always try disarming it again but never restart.

RGB666 commented 3 months ago

I will change the automation and check after 5 minutes. But I see no other way to program the automation. If the alarm is armed or disarmed depends on if I am away or at home. Only by rebooting HA I can check the real alarm status

Pigotka commented 3 months ago

@RGB666 I do not understand how restarting HA helps with the status. It looks like this without any restart: image

RGB666 commented 3 months ago

I changed the automation to check at 6.10, so 10 minutes after the alarm is disarmed. But still the same: the status did this morning still not change. After 10 HA has restarted and the alarm status changed accordingly. As many of my automations depend on the alarm status it is crucial for me to have the correct alarm status all the time.

RGB666 commented 3 months ago

image

Pigotka commented 3 months ago

@RGB666 so first of all it has nothing to do with this bug, ideally create a new one and move the discussion there. I would like to see debug logs. You can enable debug logging directly on the integration itself. Then you can either wait for 6AM again or just turn the alram on and off manually in the house to see how the integration detects armig status change. I'm quite sure the problem is in your HA configuration rather then integration itself.

RGB666 commented 3 months ago

Sure, I will do. Already thank you in advance helping debugging this problem. I appreciate your time and effort!

RGB666 commented 3 months ago

I disabled the reboot automation yesterday, so there were no reboots today. Here is what happened this morning: • At 6:00 AM, the Jablotron system was automatically disarmed as scheduled in the Jablotron calendar. • At 7:19 AM, I checked the alarm status using the HA mobile app (iOS). The status was incorrect, showing as "armed away" (see attached image). • Using the Jablotron mobile app, I armed the alarm, waited some seconds, and disarmed the alarm again. • I checked the alarm status in the HA mobile app (iOS) once more, and this time the status was correct: "disarmed" (see attached image). Upon reviewing the logfile, I noticed something unusual: • At 7:19:38, the alarm status changed to "arming", but 5 seconds earlier, at 7:19:33, the alarm status changed to "disarmed". Clarification PG10: PG10 follows the alarm status as programmed in the Jablotron software. PG10 is high when the system is armed and low when it is disarmed. Also the status of PG10 did not change at 6:00

I made a logfile, but I am not allowed to upload the file to this thread.

IMG_4308 IMG_4306 IMG_4307 002198

RGB666 commented 3 months ago

@RGB666 so first of all it has nothing to do with this bug, ideally create a new one and move the discussion there. I would like to see debug logs. You can enable debug logging directly on the integration itself. Then you can either wait for 6AM again or just turn the alram on and off manually in the house to see how the integration detects armig status change. I'm quite sure the problem is in your HA configuration rather then integration itself.

I made a logfile, but I am not allowed to upload the file to this thread.