davidrapan / ha-solarman

⚡ Solarman Stick Logger integration for 🏠 Home Assistant
MIT License
110 stars 25 forks source link

Unable to get data when Deye inverter get offline #203

Open bobybob69 opened 1 month ago

bobybob69 commented 1 month ago

Describe the bug When Deye inverter M200G4 get offline, impossible to have data live. All entities get offline. Need to re-load the integration

Attach the debug log Will be uploaded soon

To Reproduce Just after overnight when first sunlight appear and panel produce energy, nothing happen on the HA app, however everything is live on the SOLARMAN app.

Expected behavior Data should be back online even after inverter get offline during overnight.

Screenshots

image

Energy dashboard not showing anything for todays

image

Production from the SOLARMAN app for the same day of energy dashboard

image

Entities offline except one "Total Production 4", don't know why...

Metadata: Version: v24.10.04

CrazyUs3r commented 1 month ago

double #87

davidrapan commented 1 month ago

@bobybob69 described situation where device in HA don't come up in the morning so it's not duplicate.

I'm waiting for the logs. 😉

bobybob69 commented 1 month ago

hey @davidrapan and @CrazyUs3r thanks for checking , hope you're good !

Yesterday I hope my HA dashboard to check energy production and I didn't get any data (as share on the screenshot). Yesterday I try rebooting the instance but it doesn't change anything. I didn't get opportunity to DL the log as I was on my iPhone.

Today I open again and I can see my inverter is still offline again. I'll see around 12 if still offline but I'm sure it will be.

I was not sure about the similarity with #87 but If you mention it , perhaps it is ?

here are the logs

home-assistant_solarman_2024-10-09T03-53-16.385Z.log

davidrapan commented 1 month ago

This is enabled debug log from previous day until morning?

bobybob69 commented 1 month ago

Hi @davidrapan it should but as you ask it mean that it's not the case ?

Here are the pictures of my HA ENERGY dashboard just now

image

We can see the entities are offline

image

What would be the best to help you troubleshoot ?

davidrapan commented 1 month ago

I really ha no idea what is happening.. I will need a moment to think about it. 😉

davidrapan commented 1 month ago

Hi @bobybob69, did you for example just try to hit the Reload button under that 3 dots menu on the list with Solarman devices?

bobybob69 commented 1 month ago

hi @davidrapan , yes I try and bellow are the logs

As it's overnight, inverter aren't producing anything. But they can't be reachable.. it's strange and that's the issue that occurred . Is it expected the inverter to not be reachable when their is no production ?

I'll try again tomorrow morning just so the log record the changes

thanks for helping mate

davidrapan commented 1 month ago

Yes microinverters are turning off when there is no sunlight.

bobybob69 commented 1 month ago

Hi @davidrapan you good ?

Bellow are the logs and what I'm seeing on the entities .. I don't know why they all turn as unavailable . Does a rename of the entities can cause an issue ? Otherwise the issue for the inverter turning off and on but not on HA .. no idea why It happen

Any though ?

Thanks mate ! home-assistant_solarman_2024-10-13T10-12-41.204Z.log

Capture d’écran 2024-10-13 à 12 14 09
davidrapan commented 1 month ago

Did you tried that reload button when it gets into this state?

bobybob69 commented 1 month ago

Hi @davidrapan yes I click the reload button but it stay unavailable. I re-press this button just now and it goes back online with the data. Also, I'll check tonight when they'll go offline if when buck on in the next day, data canes back normally. I activate the logs and will share then with you tomorrow

bobybob69 commented 1 month ago

hey @davidrapan

Here are the logs , for 2 days of works and right now here's what I'm seeing : error message everywhere .

If I clicked reload, it works back as expected

home-assistant_solarman_2024-10-14T16-01-30.348Z.log

Capture d’écran 2024-10-14 à 18 02 43

after clicking the reload button

Capture d’écran 2024-10-14 à 18 03 38 Capture d’écran 2024-10-14 à 18 04 27
davidrapan commented 1 month ago

This behavior is honestly really weird and I can't think of anything we could try to reveal what's going on... :-/

bobybob69 commented 1 month ago

Hi @davidrapan , just to let you know, is happen again this morning.. the inverter goes offline from the integration. It's strange because I was using Stephane Joubert integration and I didn't get theses issues.. what could cause it to happen ? I re-enable the logs and will share them later. Yesterday they were offline due to non-production, and when they produce back , integration show the error.

Capture d’écran 2024-10-17 à 08 57 16

What would you need as infos to have better context to understand what can cause the issue ? Only the logs are enough ?

Thanks and have a great day

davidrapan commented 1 month ago

Hello @githubDante, do you maybe have any idea (cause I'm out of them) of what could be wrong here?

githubDante commented 1 month ago

This is really bad:

OSError: [Errno 24] No file descriptors available

It's an indication for FD leak somewhere. The question is who is causing it. It can be this integration, but it could also be something HA related (e.g. other modules).

What happens at night when these micro inverters are offline ?!? Retries until successful connection or something else ?

davidrapan commented 1 month ago

Ou I did not notice that OSError... That truly is bad.

What happens at night when these micro inverters are offline ?!? Retries until successful connection or something else ?

Yes. Retries.

There are quite few of users with microinverters which also go offline during the night but do not experience this issue.

githubDante commented 1 month ago

Maybe they don't have so many inverters. There are at least 3 here.

@bobybob69 can you provide a log for the interval between e.g. 18:00PM and 07:00AM, or an extended log for 24 hours or more.

davidrapan commented 1 month ago

Maybe they don't have so many inverters. There are at least 3 here.

Yeah that's true though.

davidrapan commented 1 month ago

Isn't there any way how we could easily reuse sockets?

githubDante commented 1 month ago

No, they must be released. The good news is that the issue is not caused by the integration/pysolarmanV5, it must be something else in the @bobybob69 installation that leak FDs (not necessary network related).

How I know that the issue is elsewhere - with HA in a container and several fake inverters with different addresses of running hosts (one is connected to a web server on port 80 :smile: and it's very noisy ) in it, then I monitor the connections and their states.

@bobybob69 what's the output of this command:

ls /proc/`ps xalf | grep hass | grep -v grep | awk '{print $3}'`/fd | wc -l 

How many are network connections ?!?

lsof -i -a -np `ps xalf | grep hass | grep -v grep | awk '{print $3}'` | grep TCP

or with ss:


ss -ntp | grep `ps xalf | grep hass | grep -v grep | awk '{print $3}'`
davidrapan commented 1 month ago

No, they must be released. The good news is that the issue is not caused by the integration/pysolarmanV5, it must be something else in the @bobybob69 installation that leak FDs (not necessary network related).

I also ran a test with one real and three fake inverters and came to the same conclusion...

bobybob69 commented 3 weeks ago

This is really bad:

OSError: [Errno 24] No file descriptors available

It's an indication for FD leak somewhere. The question is who is causing it. It can be this integration, but it could also be something HA related (e.g. other modules).

What happens at night when these micro inverters are offline ?!? Retries until successful connection or something else ?

hi @githubDante , tonight I was trying something to integrate my smart meter and I had to restart HA instance, when back on, I go on the solarman and the inverter are offline (as there is no production)

bellow are the log after reloading the solarman instance for each devices sorry I didn't notice your message earlier

home-assistant_solarman_2024-11-02T21-31-55.556Z.log

here's the actual looking of the solarman integration for my inverter and the smart meter I'm trying to integrate on #187 with @davidrapan

solarman error status inverter
bobybob69 commented 3 weeks ago

No, they must be released. The good news is that the issue is not caused by the integration/pysolarmanV5, it must be something else in the @bobybob69 installation that leak FDs (not necessary network related).

How I know that the issue is elsewhere - with HA in a container and several fake inverters with different addresses of running hosts (one is connected to a web server on port 80 😄 and it's very noisy ) in it, then I monitor the connections and their states.

@bobybob69 what's the output of this command:

ls /proc/`ps xalf | grep hass | grep -v grep | awk '{print $3}'`/fd | wc -l 

How many are network connections ?!?

lsof -i -a -np `ps xalf | grep hass | grep -v grep | awk '{print $3}'` | grep TCP

or with ss:

ss -ntp | grep `ps xalf | grep hass | grep -v grep | awk '{print $3}'`

@githubDante , I try from the terminal menu of home assistant, and bellow are the result (all seems to fail, except the second, I press enter but nothing happen..)

line comand result

anything I could help with to troubleshoot ? thanks

githubDante commented 2 weeks ago

Hi,

The name of the main process is not hass in your installation. Try to identify it and use it in the grep command

ls /proc/`ps xalf | grep <process name> | grep -v grep | awk '{print $3}'`/fd | wc -l 

and

lsof -i -a -np `ps xalf | grep <process name> | grep -v grep | awk '{print $3}'` | grep TCP

If you know the PID you can use it directly:

ls /proc/<PID>/fd | wc -l

and

lsof -i -a -np <PID> | grep TCP
bobybob69 commented 2 weeks ago

hey @githubDante , any tips to know how can I know which process I should look at ? and same for the PID ? Sorry it's not familiar for me here, but ready to know how to !

Please found below the logs after the solar production start , all the inverter get back online. But I had to manually refresh the configuration from each integration for the inverter.

Thanks for your help mate !

home-assistant_solarman_2024-11-03T08-43-57.388Z.log

davidrapan commented 2 weeks ago

How is your HA installed?

BTW, you are from the future? Cause your latest posts says "bobybob69 commented in 30 minutes"! 😆

githubDante commented 2 weeks ago

You can use ps xalf to list all processes or to scan manually /proc/*/comm & /proc/*/cmdline with ls -l & cat in order to find it. Considering the fact that this is some tiny system (using busybox) you should not have many processes, especially python3.12 related.

The last log shows something which is definitely related to the FD leak issue. The ics_calendar extension is behaving rather funky. It starts here:

2024-11-02 22:24:02.142 ERROR (SyncWorker_4) [custom_components.ics_calendar.calendar] Schedule Apple Loris: Failed to open url...

continues with:

(error count: 4 - this error is ratelimited)

and then its connection limit is getting exhausted:

2024-11-02 22:24:08.916 WARNING (SyncWorker_4) [urllib3.connectionpool] Connection pool is full, discarding connection: p139-caldav.icloud.com. Connection pool size: 10

Another limit is reached here:

2024-11-02 22:24:53.112 WARNING (MainThread) [homeassistant.components.homekit] Cannot add climate.clim_mael as this would exceed the 150 device limit. Consider using the filter option

The OS errors OSError: [Errno 24] No file descriptors available start 20-30 minutes later while the ics_calendar still tries to open that URL.

The tests performed by me & @davidrapan on a clean install show no issues with this integration, so the root of the issue must be in another extension. Try to disable them one by one and you should find the culprit.

bobybob69 commented 2 weeks ago

hey @githubDante , just checked on terminal, what I did few month ago is to replace the RPi and in the mean time I did a clean install + backup restoration from my previous HA installation. Does this can cause issue ?

Please found bellow results for ps xalf

ps xalf results

bellow are the list for command busybox --list

busybox 1 busybox 2 busybox 3 busybox 4 busybox 5 busybox 6

does this help ?

Are you suggesting to clean install HA and re-install module one by one ? maybe that could solve the issue ?

thanks for your help

githubDante commented 2 weeks ago

does this help ?

No.

Are you suggesting to clean install HA and re-install module one by one ? maybe that could solve the issue ?

I'm not fammiliar with HA/HA OS at all, but yes, start from scratch or disable/uninstall the modules/integrations which you do not use. Does this ics_calendar even work for you?!?

davidrapan commented 2 weeks ago

I don't understand why you just don't try to remove devices from these other integrations (or even remove them completely) which are running there. It does not look like they even work so... it's really no brainer.

bobybob69 commented 2 weeks ago

Hey guys @githubDante yes the ics calendar works but if it need to be deleted it will not be a problem to do so.

What should I do to help so ? From where the command should be executed ? I'm sorry to not be as efficient as you would 😅

@davidrapan which devices should I remove that you suppose are set incorrectly? I'm not sure to understand

davidrapan commented 2 weeks ago

According to the log ics_calendar have or is causing some issues for example.

We told you, try start disabling some integrations (from HACS) one by one until the problem with solarman disappears...

Something in your HA is exhausting resources and thus causing issues which results in solarman not working... 😉