Breina / PowerTagGateway

HomeAssistant integration for EcoStruxure gateways; SmartLink, PowerTag Link and Panel Servers.
MIT License
13 stars 7 forks source link

Crash of HA when this integration is active #14

Closed usky73 closed 4 months ago

usky73 commented 8 months ago

Hi I have tested this integration for a while without issue using a PowerTagLink & one powertag (single phase A9MEM1560) connected through a CPL plug. The plug died but I had a panel sever PAS600 that is wifi connected. this is why I did this change.

I removed the PowerTagLink and replaced it by the PAS600 in the integration module. I have renamed the sensor in the PAS600 to keep the same sensor name than before.

Info : autodiscovery did not work, I had to fill the IP manually.

I am able to receive the data without issue... for few hours. After that, the sensor is not responding and I cannot access to the Home assistant web page. The data of other HA sensor remains captured by HA and sent to Influx but not the PowerTag. The only solution was to restart HA container.

PAS600 version 001.008.000 HA : 2023.11.3

I have tested the link with PAS600 with Uptime Kuma and there is a full availablility, no disconnection. Can you help me ? What else do you need from me to find the issue ?

Breina commented 8 months ago

Info : autodiscovery did not work, I had to fill the IP manually.

Yeah this is known. After a certain period, the PAS stops responding to discovery requests. It will also not show up in Windows' Network tab. So this is a PAS issue, not the integration's.

Can you help me ? What else do you need from me to find the issue ?

I'm thinking that the PowerTag itself sometimes disconnects. I've noticed as well that my PAS has a more difficult time maintaining connections that the PowerTag Link.

There's some diagnostic information that we're logging that might provide some insights.

You can find these when opening the device of your A9MEM1560 in HA.

image A tag with good connectivity

image A tag with bad connectivity

The LQI, packet error rate and RSSI both have entities for the gateway's interpretation as well as the PowerTag itself, which will become unavailable intermittently. Here's an example of my second worst PowerTag (my worst doesn't even connect once):

image

image

It's not distance or interference; I've played around with that. It's the PowerTag that has issues. I'm afraid that you'll have to go through Schneider Support and hope they'll replace it. I think I'll have to do this myself as well.

usky73 commented 8 months ago

thanks for taking time to answer. All the quality parameters are ok since the beginning (2 days), no error. 2 days because I have deleted the integration and install it again. It is working fine since 2 days but HA crashed 5 times this night (watchdog restarted it)

And also, I don't know if I was clear, but the main issue is the crash of HA interface which is not happening now. I am wondering why the intégration has generated crash of HA.

Breina commented 8 months ago

Sorry I missed that! I'm curious why that happens as well, that's not an intended mechanic.

Can you briefly disable the watchdog briefly and then grab the logs before they get overwritten?

Also what version of HA are you on, and what version of the integration?

usky73 commented 8 months ago

I will have a look for watchdog, I need to refresh my memory ! how to get ha logs before crash ?

HA : 2023.11.3 for the integration where to find the version ?

Breina commented 8 months ago

I don't know your watchdog works, I don't use one myself.

how to get ha logs before crash ?

Can you check your config folder if there's a file named like home-assistant.log.1? That's going to be the logs of a previous run.

for the integration where to find the version ?

If you open the integration in HACS, you can find it in the top left:

image

usky73 commented 8 months ago

I now have the latest version of HA : 2024.1 and of this integration Same issue, even worth than before update, home assistant restart by itself but is not working. I had to restart the container. The log file you mention do not have any record with errors or warnin before but the file home-assistant.log.fault has a lot of line. Here are the one on the top of the log file : `Fatal Python error: Segmentation fault

Thread 0x531f2e60 (most recent call first): File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 81 in _worker File "/usr/src/homeassistant/homeassistant/components/recorder/executor.py", line 17 in _worker_with_shutdown_hook File "/usr/local/lib/python3.10/threading.py", line 953 in run File "/usr/local/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/usr/local/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x53305e60 (most recent call first): File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 81 in _worker File "/usr/src/homeassistant/homeassistant/components/recorder/executor.py", line 17 in _worker_with_shutdown_hook File "/usr/local/lib/python3.10/threading.py", line 953 in run File "/usr/local/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/usr/local/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x55019e60 (most recent call first): File "/usr/src/homeassistant/homeassistant/util/async_.py", line 160 in protected_loop_func File "/usr/local/lib/python3.10/site-packages/pychromecast/socket_client.py", line 427 in initialize_connection File "/usr/local/lib/python3.10/site-packages/pychromecast/socket_client.py", line 696 in _check_connection File "/usr/local/lib/python3.10/site-packages/pychromecast/socket_client.py", line 563 in run_once File "/usr/local/lib/python3.10/site-packages/pychromecast/socket_client.py", line 540 in run File "/usr/local/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/usr/local/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x5820ce60 (most recent call first): File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 81 in _worker File "/usr/src/homeassistant/homeassistant/components/recorder/executor.py", line 17 in _worker_with_shutdown_hook File "/usr/local/lib/python3.10/threading.py", line 953 in run File "/usr/local/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/usr/local/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x57340e60 (most recent call first): File "/usr/local/lib/python3.10/site-packages/scapy/supersocket.py", line 264 in select File "/usr/local/lib/python3.10/site-packages/scapy/sendrecv.py", line 1219 in _run File "/usr/local/lib/python3.10/threading.py", line 953 in run File "/usr/local/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/usr/local/lib/python3.10/threading.py", line 973 in _bootstrap `

Breina commented 8 months ago

Segmentation fault! :o

Any chance you can give your device more memory?

If you're running on VirtualBox, it may help to update it: https://community.home-assistant.io/t/home-assistant-keeps-crashing-restarting-often/466895/8?u=breina

usky73 commented 8 months ago

It is on docker, the rasppi server has 8gb of memory and uses only 2 of them. I have tested many times, this is happening only when this integration is enabled and runnning

Breina commented 8 months ago

Maybe it's because this integration produces much data. 3 of the 5 threads are pushing data into your recorder. Have you have integrated a time series database by any chance?

usky73 commented 8 months ago

if you mean influxdb, yes

Breina commented 8 months ago

Can you check whether InfluxDB is running out of memory? Perhaps upgrade it if possible?

Try disabling it temporarily to verify it's coming from there.

usky73 commented 7 months ago

I will make a test tomorrow but I have the recording of memory free in influxdb. I always had more than 5Gb of memory free. I don't think that it is link to all the app running.

usky73 commented 7 months ago

I have made some test, if I shut down InfluxDb docker, no issue. If I start influxDb but do not report the sensor connected to PAS600 in influx, no issue. If I enable the reporting it made a server crash (no more ssh login) 10h later.

I have this issue only with this integration, something looks wrong somewhere... Any idea ?

Breina commented 7 months ago

From what I can see in your logs, it's the recorder that's crashing. I'm also using Influx to gather data from the integration, but am not seeing any issues. I really think that the data we're recording causes Influx to be pushed over its edge and fail due to resource constraints.

Can you monitor its resources leading up to the crash?

usky73 commented 7 months ago

if you tell me how ? Monitor the record in influx ?

Breina commented 7 months ago

Are you seeing logs like this?

https://community.home-assistant.io/t/influxdb-crash-every-2-days/196725

Check the logs of Influx itself, might be something there as well.

usky73 commented 7 months ago

No log like this. It is running since 3 days without issues but I have done few updates. Panel server : set a correct time & date, not yet good as this feature is not working well for the moment. InfluxDB : I had moved the server to a new one and some process were running with the old IP, generating errors. This is now solved.

AnywayI don't understand why it became critical when I put this 2 new values in the influxDB recorder in HA... I will keep you informed and thanks for your support, you gave me the good direction with influxDB.

Breina commented 4 months ago

Going to close this for now. Please re-open if you see this issue again.