bramstroker / homeassistant-powercalc

Custom component to calculate estimated power consumption of lights and other appliances
MIT License
897 stars 250 forks source link

Prevent blocking I/O in eventloop #2266

Closed jpbede closed 1 month ago

jpbede commented 1 month ago

This PR tries to fix https://github.com/bramstroker/homeassistant-powercalc/issues/2264 with wrapping the open calls in a function executing them in the executor thread.

I tested this on my production system and saw no breakage

bramstroker commented 1 month ago

@jpbede Nice! I was working on same last hour. Only had remote.py done yet, so great you have handled all the other cases. Had 1 roadblock / failing test. Maybe in your PR it is fixed. When it's not fixed I'll elaborate about the exact cause / possible solution. Have sent bdraco a private message on discord about it. I'll have a look at your PR and compare with my changes.

bramstroker commented 1 month ago

Your PR is failing exactly on the same test as me.

The problem is in: https://github.com/bramstroker/homeassistant-powercalc/blob/master/custom_components/powercalc/__init__.py#L318

I have sent following to bdraco about it:

============== For setup of YAML sensors certain sensors needs to be setup before other sensors. There is an include functionality in powercalc, and for that to work correctly first all the "non include" sensors must be setup. I didn't have problems with it before, but now I moved open() to non blocking I/O I have timing issue.

To solve the issue I tried to create two buckets of tasks and use gather to resolve all the coroutines. As follows:

    sensors: list = domain_config.get(CONF_SENSORS, [])
    primary_tasks = []
    secondary_tasks = []

    for sensor_config in sensors:
        sensor_config.update({DISCOVERY_TYPE: PowercalcDiscoveryType.USER_YAML})
        task = create_eager_task(async_load_platform(
            hass,
            Platform.SENSOR,
            DOMAIN,
            sensor_config,
            config,
        ))

        if CONF_INCLUDE in sensor_config:
            secondary_tasks.append(task)
        else:
            primary_tasks.append(task)

    await asyncio.gather(*(task for task in priority_tasks))
    await asyncio.gather(*(task for task in secondary_tasks))

I expect the primary tasks to be executed first and than all the secondary tasks, however with this approach still the secondary task is executed first.

jpbede commented 1 month ago

You can probably do something like this:

    async def _done_cb(_: asyncio.Task) -> None:
        await asyncio.gather(*(task for task in secondary_tasks))

    task = asyncio.gather(*(task for task in primary_tasks))
    task.add_done_callback(_done_cb)
    await task

This should ensure that the secondary_tasks gets executed after the primary_tasks are done. Added this to the PR.

bramstroker commented 1 month ago

You can probably do something like this:

    async def _done_cb(_: asyncio.Task) -> None:
        await asyncio.gather(*(task for task in secondary_tasks))

    task = asyncio.gather(*(task for task in primary_tasks))
    task.add_done_callback(_done_cb)
    await task

This should ensure that the secondary_tasks gets executed after the primary_tasks are done. Added this to the PR.

Seems a logical solution. However tried locally, but than the test is running forever / hanging. Not sure why yet.

edit All tests are timing out now in github action as well, so this is not the solution unfortunately

bramstroker commented 1 month ago

Could you revert the last change with that priority and fix the mypy issues?

I'm struggling to find a solution for that one failing test. Have to go for groceries now, will have another look later.

jpbede commented 1 month ago

Could you try this, I can't get the tests fully working on my machine:

    primary_sensors = []
    secondary_sensors = []

    for sensor_config in sensors:
        sensor_config.update({DISCOVERY_TYPE: PowercalcDiscoveryType.USER_YAML})

        if CONF_INCLUDE in sensor_config:
            secondary_sensors.append(sensor_config)
        else:
            primary_sensors.append(sensor_config)

    def _done_cb(_: asyncio.Task) -> None:
        """Load secondary sensors after primary sensors."""
        for sensor_config in secondary_sensors:
            hass.async_create_task(async_load_platform(
                hass,
                Platform.SENSOR,
                DOMAIN,
                sensor_config,
                config,
            ))

    task = asyncio.gather(*(
        hass.async_create_task(async_load_platform(
            hass,
            Platform.SENSOR,
            DOMAIN,
            sensor_config,
            config,
        ))
        for sensor_config in primary_sensors
    ))
    task.add_done_callback(_done_cb)
    await task
bramstroker commented 1 month ago

Could you try this, I can't get the tests fully working on my machine:

Thanks. Unfortunately same issue with this approach.

Tests can be run with following. Assuming you have poetry installed.

poetry install --no-root
poetry run pytest tests/

The test start failing after this call has been added:

json_data = await self.hass.async_add_executor_job(_load_json)

https://github.com/bramstroker/homeassistant-powercalc/pull/2266/files#diff-8c729fbc6439e154cbe0ed99180490688b31cd76a3763ca69379dc4a932b01adR116

When I keep that sync the test passes.

I have put in a debugger and whenever I pass this line it directly starts excecuting a "secondary_task"

Basically the scenario is as follows.

Primary tasks:

TaskA (setup sensor A)

TaskB (setup sensor B)

Secondary tasks:

TaskC (setup sensor with include functionality)

TaskC should only ever be starting to execute after TaskB is fully finished, but it seems it starting halfway the execution of TaskB, at the place where hass.async_add_executor_job.

jpbede commented 1 month ago

I think I may have found a solution. At least the tests no longer fail locally.

bramstroker commented 1 month ago

I think I may have found a solution. At least the tests no longer fail locally.

Awesome, you rock. It's kind of a hacky solution, but I think it's the best we can do now. Might need some comments in code about why it's done like this.

I also looked into it for a few hours today, but couldn't get it to work. Had some discussions with bdraco on discord and he said I could have a look into creating futures, and passing them in discovery data, and later call set_result. But not sure how yet, and will make it overly complex.

jpbede commented 1 month ago

I also had these failed tests locally, which were caused by the lack of areas. I fixed them by replacing the area_reg with the area_registry fixture.

bramstroker commented 1 month ago

Unfortunately more tests are breaking now, so this is probably not the direction we must head.

I can share you some communication I did with bdraco, maybe you have an idea how to implement that approach? Do you have discord?

bramstroker commented 1 month ago

I also had these failed tests locally, which were caused by the lack of areas. I fixed them by replacing the area_reg with the area_registry fixture.

Ok nice! All tests are working than after that change?

jpbede commented 1 month ago

I hope so. I do have mixed results on my machine, sometimes they fail. Let's see what the CI says. I need to move the event firing to the run_powercalc_setup so it is done for all tests. Additionally I needed to set the states test_nested_conditions before running the setup, otherwise the condition helper would complain

bramstroker commented 1 month ago

You have done it ;-). Test passed!!

jpbede commented 1 month ago

All green 🥳

jpbede commented 1 month ago

I would give this a quick shoot on my prod. system.

bramstroker commented 1 month ago

Walked through the PR once more. Everything looks fine. Thanks a lot for all your efforts on this one! Seemed like a relatively small task but at the end it bit us.

bramstroker commented 1 month ago

I would give this a quick shoot on my prod. system.

Ah sure, let me know if you find any regressions.

I'll wait with the release.

jpbede commented 1 month ago

Thanks a lot for all your efforts on this one! Seemed like a relatively small task but at the end it bit us.

You're welcome! Always happy to help 🙂

Ah sure, let me know if you find any regressions.

Works like a charm, but I do only have sensors configured via UI not YAML.

bramstroker commented 1 month ago

I also did a quick check in my test instance. No issues. Have confidence in the extensive test suite.