AppDaemon / appdaemon

:page_facing_up: Python Apps for Home Automation
Other
853 stars 418 forks source link

Avoid copying states in get_state() if read-only access is needed only #467

Closed bob1de closed 5 years ago

bob1de commented 5 years ago

Hi,

I think that deep-copying the whole HA state adds considerable overhead to get_state(), especially when called for a whole domain or even complete HA state.

It turns out that in most cases, you just need read-only access to the states. I see the reason why they're deep-copied at the moment, and this should probably stay the default behaviour, but I think there should be a way to retrieve states without this overhead when you really know you won't modify it. I, for instance, do sometimes query the whole state to search for entities with specific characteristics and know that I won't ever write to it.

I'd suggest a new keyword argument for get_state(), maybe named copy with a default of True. I can make a PR for this, if desired. I could even greatly simplify the get_state() code simultaneously.

What do you think? I can even file the PR for you to have a look at it directly, because that's no big deal to implement.

Best regards Robert

ReneTode commented 5 years ago

what do you mean when you say deep-copying?

isnt get_state just always only to read data?

and another question is: do you look at the code from 3.0.2 or to the code from the latest dev? because the latest dev is quite different from the stable version when it comes to data handling.

AD isnt just a platform to connect to HASS, but a platform that can retrieve data from any other platform if the right plugin is created (openhab, mqtt, smartthings, etc.)

acockburn commented 5 years ago

Deep copying is making a copy of whatever it is rather than just returning a reference. Returning a reference is quicker but has the disadvantage that an app could change internal AppDaemon state if it wasn't careful, which is why deep copy was used. @efficiosoft - have you seen any real world examples of this causing a performance problem? I'm curious because I haven't.

I'm not opposed to this the way you framed it - backward compatibility for existing users and a deliberate contract with the user to accept the danger, so yes I would like to take a look at a PR, thanks.

bob1de commented 5 years ago

@acockburn Hmm, I haven't taken measurements yet, was just curious about it. My AD setup runs on a recent Core i3, so performance will probably not be affected very much. But as many users are running on Raspberry Pis, it might be different there. And it of course heavily depends on the number of entities you have.

In Schedy, which I authored, I have a function that can filter entities based on criteria that attributes must fulfill. This usually reads in the whole state to perform comparisons on it. There, copying is clearly not needed, and as the function can potentially be called quite often, depending on the users schedules, I was looking for ways to improve performance as much as possible.

BTW, when this is supposed to run with different plugins, are states then mapped to the HA format somehow? I ask because the current get_state() implementation in state.py is used for all plugins and assumes entries like "state" and "attributes".

bob1de commented 5 years ago

@ReneTode Yes, talking about the dev branch.

bob1de commented 5 years ago

@acockburn What I do have noted is that get_state() usually takes multiple msec to return, even on my relatively beefy machine. Probably that's because of the copying, but I don't know for sure.

acockburn commented 5 years ago

@efficiosoft - BTW, when this is supposed to run with different plugins, are states then mapped to the HA format somehow? - yes, I'll document plugin development at some stage but one of a plugin's duties is to map whatever is an appropriate notion of state to something similar to HASSs structure. This is done for consistency and because HASS had a fairly open and robust notion of state, plus HASS wins by being the first plugin developed :) Even MQTT doesn't actually have state at this point although it is an ongoing discussion to add it to allow HADashboard support for pure MQTT deployments. Future plugins will doubtless have state that will also be mapped to be similar. The only real requirements are that an entity has a domain and a name, and that it has an attribute called state, any sub attributes live in a sub-dictionary called attributes. I'm more than willing to extend this if it seems useful but that is where we are right now. As you are probably aware, I recently added a bunch of tracking for the admin interface and found it convenient to make them first class entities, it worked out well.

At some stage too I'll be taking a look at performance overall for AD - some recent benchmarks got the scheduler running at about 120 calls per second on my hardware but that includes the overhead of the admin interface which I am sure adds some latency and a quick profile did show a lot of time being eaten up by deepcopies, so I think your concerns are justified.

Also, I am aware of your work on schedy and happy to support it in any way with AD changes, enhancements etc. that make sense :)

I'll merge the associated PR as part of the runup to the new 4.0b1, and I'll close this issue now.

bob1de commented 5 years ago

Sounds all really good. I think the way HA stores its states is well suited as a common notion for all plugins to follow.

Thanks for recording the profiles, don't have to do it myself then. :)

AppDaemon is a really powerful idea. I talked about this with @ReneTode some weeks ago and think it can both be a great framework for developers to develop reusable apps to be distributed via PyPi and a tool for writing small, specialized apps as the average user quickly. I'll try to do my best to bring this idea forward and help with ideas and pull requests whenever possible. I'm just a bit scared of the tremendous complexity the code gets over time. It's become really huge and twisted, IMHO. Is this something you're concerned about as well, or is it just my biased feeling? Don't get this wrong, I'm just a Python developer since quite some years and try to keep things as minimalist and lucid as possible.

acockburn commented 5 years ago

Thanks and appreciate your input. I am a veteran programmer of probably 40 years, but this is my first python project so I am learning as I go along in that respect. I am trying to stay on top of things and keep it as simple as possible - you no doubt noticed the refactor I did recently - trying to keep it maintainable, and as the new plugin structure beds in sorting out some of the discrepancies and built in assumptions.

Long term my vision is that AppDaemon will be used for everything from home automation to industrial processing to .... who knows what? I have revamped the scheduler to remove the 1s resolution limitation and am looking at performance as well for more potential real time applications.

One of the goals of the admin interface is to add a full editing and configuration suite for apps, as well as to add the ability to package and exchange apps via github and/or pypi. Another reason for the restructure and cleanup was to pave the way for async apps, so everything that I have been changing has been to a plan and to support eventual goals, even if that is not visible from just watching commits into dev, which I am sure you are doing :)

We do have a discord channel that you are welcome to join for more realtime in-depth discussion, or if at any time written exchange works better for you feel free to PM me on the forum.

bob1de commented 5 years ago

First, thanks for keeping this cool project running. I do appreciate your work a lot.

Yes, due to its interpreted and dynamically typed nature, Python is quite different to some other languages in that you usually can do things pretty concise and safe 50% of lines and complexity, which pays off when you look at it a week later. ;D

What I was talking about is actually the code itself, not features and interfaces, and the refactoring surely helped a lot to keep it maintainable. I just sometimes feel lost in the deeps of nested functions and conditions without docstrings, but that's the view from outside I'm sure.

To be honest, I haven't looked into the admin interface at all, yet. The resource I consider most when working with it is the adapi, as that's the interface I have to interact with.

Never used Discord before, that's voice-based, right? But sounds like a great platform for such discussions. Just registered with them and would be loving to join.

acockburn commented 5 years ago

Discord does have voice but it is primarily a chat based interface the way we use it - we have channels for both AppDaemon and HADashboard, and its a great way to ask and answer questions and suggest enhancements.

Here is a link to our server:

https://discord.gg/pHsjADY

Hope to see you soon!