Coroutine took too long. ERROR: No data returned from HASS entity sensor.solmod_house_load

swests commented 1 month ago

Describe the bug Been seeing an issue where, during the evaluation loop and at startup, the integration fails to run, with an error: ERROR: No data returned from HASS entity sensor.solmod_house_load. I can bypass it by reducing the "Load History Days" parameter, until it works. This has been becoming progressively worse. I used to load 28 days, now I'm down to around 14. Sometimes it 'blips' and then revovers on a subsequent evaluation loop.

Looking at the logs, I see:

To Reproduce Steps to reproduce the behavior: Set Load days to load to a high value

Screenshots If applicable, add screenshots to help explain your problem.

Versions HAOS as a VM on ProxMox host HA Core: 2024.9.3 Supervisor: 2024.09.1 Operating System: 13.1 Frontend: 20240909.1 MariaDB running as a container on the same ProxMox host DB connection via network port.

Additional context

Doing some investigation, I isolated the SQL query that's causing the timeout: SELECT anon_1.metadata_id, anon_1.state, anon_1.last_updated_ts, anon_1.attributes FROM (SELECT anon_2.metadata_id AS metadata_id, anon_2.state AS state, anon_2.last_updated_ts AS last_updated_ts, anon_2.attributes AS attributes FROM (SELECT states.metadata_id AS metadata_id, states.state AS state, 0 AS last_updated_ts, CASE WHEN (state_attributes.shared_attrs IS NULL) THEN states.attributes ELSE state_attributes.shared_attrs END AS attributes FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id WHERE states.last_updated_ts < 1726054357.0e0 AND states.metadata_id = 1104 ORDER BY states.last_updated_ts DESC LIMIT 1) AS anon_2 UNION ALL SELECT anon_3.metadata_id AS metadata_id, anon_3.state AS state, anon_3.last_updated_ts AS last_updated_ts, anon_3.attributes AS attributes FROM (SELECT states.metadata_id AS metadata_id, states.state AS state, states.last_updated_ts AS last_updated_ts, CASE WHEN (state_attributes.shared_attrs IS NULL) THEN states.attributes ELSE state_attributes.shared_attrs END AS attributes FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id WHERE (states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.metadata_id IN (1104) AND states.last_updated_ts > 1726054357.0e0 AND states.last_updated_ts < 1727695957.0e0) AS anon_3) AS anon_1 ORDER BY anon_1.metadata_id, anon_1.last_updated_ts

When I run this in the MariaDB CLI the query takes 9 seconds, returning 63,250 rows. clearly network and ingestion time cause the issue. Can this have the timeout increased? Or can this be moved outside of the async processing?

I see there are some similar issues: https://community.home-assistant.io/t/appdaemon-crashing-when-get-history-takes-too-long/727963

and saw this (but not sure it's related) https://stackoverflow.com/questions/65605639/coroutine-callback-when-running-job-is-longer-than-10-seconds

stevebuk1 commented 1 month ago

Note that issue #269 raised yesterday appears to be the same issue, excerpt from thier Pv_opt.log:

12:59:25 INFO: Solcast forecast loaded OK 12:59:25 INFO: 12:59:25 INFO: Getting expected consumption data for 2024-09-29 00:00:00+0000 to 2024-10-01 00:00:00+0000: 12:59:35 WARNING: Coroutine (<coroutine object Hass.get_history at 0xffff94322440>) took too long (10 seconds), cancelling the task... 12:59:46 WARNING: Coroutine (<coroutine object Hass.get_history at 0xffff94321640>) took too long (10 seconds), cancelling the task... 12:59:57 WARNING: Coroutine (<coroutine object Hass.get_history at 0xffff94322440>) took too long (10 seconds), cancelling the task... 13:00:08 WARNING: Coroutine (<coroutine object Hass.get_history at 0xffff94322340>) took too long (10 seconds), cancelling the task... 13:00:19 WARNING: Coroutine (<coroutine object Hass.get_history at 0xffff94322440>) took too long (10 seconds), cancelling the task... 13:00:20 ERROR: No data returned from HASS entity sensor.solis_house_load 13:00:20 INFO: Getting consumption in W from: sensor.solis_house_load

stevebuk1 commented 1 month ago

In issue #141, Francis suggested the following:

You might be able to fix this by adding: thread_duration_warning_threshold: 00 to your appdaemon.yaml. See here

Can you try this to see if it solves the issue?

swests commented 1 month ago

Just tried adding this and restarted AppDaemon. Made no change. It was already 45 anyway...

swests commented 1 week ago

Been prodding at this for a while. In my setup MariaDB is in a container on the same ProxMox host as the HA VM, so the timeout in the query is either resolved by extending the Coroutine timeout or making the query faster. Seems you can’t configure coroutine timeout (would need @fboundy to make code changes). So focused on DB performance.

Seems MariaDB defaults to an in memory cache of 10MB. Just increased the innodb_buffer_pool_size to 1GB in the server config and boom it works. I can now retrieve 28 days consumption history again.

fboundy commented 1 week ago

Thanks for this. I’m not actually sure there’s anything I could do about this as I’m reliant on the appdaemon code for retrieving history from HASS entities. On 9 Nov 2024 at 20:40 +0000, SimonW @.***>, wrote:

Been prodding at this for a while. In my setup MariaDB is in a container on the same ProxMox host as the HA VM, so the timeout in the query is either resolved by extending the Coroutine timeout or making the query faster. Seems you can’t configure coroutine timeout (would need @fboundy to make code changes). So focused on DB performance. Seems MariaDB defaults to an in memory cache of 10MB. Just increased the innodb_buffer_pool_size to 1GB in the server config and boom it works. I can now retrieve 28 days consumption history again. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

fboundy commented 1 week ago

Added some text under "Known Issues" in the Readme for the next patch. It looks like this setting is not available for the MaraDB add-on in HA so it may be specific to standalone docker/container installs

fboundy / pv_opt

Coroutine took too long. ERROR: No data returned from HASS entity sensor.solmod_house_load #270