CPU performance issues with appdaemon 4.x

gibman commented 4 years ago

Add-on: AppDaemon 4 Python Apps and Dashboard using AppDaemon 4.x for Home Assistant

Add-on version: 0.3.0 You are running the latest version of this add-on. System: Ubuntu 20.04.1 LTS (amd64 / qemux86-64) Home Assistant Core: 0.114.2 Home Assistant Supervisor: 235

Been running appdaemon for a few years on a rpi3 b+ model. as I extended my appdaemon setup with even more apps and functionality at some point I started to experience lagging PIR sensors / lights. I then took it upon myself to install glances in order to look at CPU usage.

The culprit was the "appdaemon process". I have 50 production apps where appdaemon CPU usage sits at around ~43-55% My next step was to transplant my hass setup to a more powerful x86 box, in this case an older mini PC (amd e-450 dual core @ 1650mhz) + SSD drive.

Big improvement. The lag was gone.

But the CPU usage is still high for appdaemon. No surprise here :)

I decided to play around with a few things in order to nail down what's exactly causing this.

So I uninstalled appdaemon, removed the appdaemon folder. Then I reinstalled appdaemon along with the helloworld class. and nothing more- Now using the default installed appdaemon.yaml file as well.

1 x helloworld vanilla setup: cpu usage is here ~2-5% apps.yaml: helloworld: class: helloworld module: helloworld

40 x helloworld apps setup: cpu usage is here ~9-13%

apps.yaml: helloworld: class: helloworld module: helloworld

helloworld1:
  class: helloworld
  module: helloworld

helloworld2:
  class: helloworld
  module: helloworld

helloworld3:
  class: helloworld
  module: helloworld

helloworld4:
  class: helloworld
  module: helloworld

helloworld5:
  class: helloworld
  module: helloworld

helloworld6:
  class: helloworld
  module: helloworld

helloworld7:
  class: helloworld
  module: helloworld

helloworld8:
  class: helloworld
  module: helloworld

helloworld9:
  class: helloworld
  module: helloworld

helloworld10:
  class: helloworld
  module: helloworld

helloworld11:
  class: helloworld
  module: helloworld

helloworld12:
  class: helloworld1
  module: helloworld1

helloworld13:
  class: helloworld1
  module: helloworld1

helloworld14:
  class: helloworld
  module: helloworld

helloworld15:
  class: helloworld
  module: helloworld

helloworld16:
  class: helloworld
  module: helloworld

helloworld17:
  class: helloworld
  module: helloworld

helloworld18:
  class: helloworld
  module: helloworld

helloworld19:
  class: helloworld
  module: helloworld

worldhello:
  class: helloworld
  module: helloworld

worldhello1:
  class: helloworld
  module: helloworld

worldhello2:
  class: helloworld
  module: helloworld

worldhello3:
  class: helloworld
  module: helloworld

worldhello4:
  class: helloworld
  module: helloworld

worldhello5:
  class: helloworld
  module: helloworld

worldhello6:
  class: helloworld
  module: helloworld

worldhello7:
  class: helloworld
  module: helloworld

worldhello8:
  class: helloworld
  module: helloworld

worldhello9:
  class: helloworld
  module: helloworld

worldhello10:
  class: helloworld
  module: helloworld

worldhello11:
  class: helloworld
  module: helloworld

worldhello12:
  class: helloworld
  module: helloworld

worldhello13:
  class: helloworld
  module: helloworld

worldhello14:
  class: helloworld
  module: helloworld

worldhello15:
  class: helloworld
  module: helloworld

worldhello16:
  class: helloworld
  module: helloworld

worldhello17:
  class: helloworld
  module: helloworld

worldhello18:
  class: helloworld
  module: helloworld

worldhello19:
  class: helloworld
  module: helloworld

now when going back to the intial 1 app config like this:

apps.yaml: helloworld: class: helloworld module: helloworld

cpu usage is still around 9-13%, I have to restart AD in order to achieve a cpu usage of around 2-5% the logs do indicate that the 39% apps are indeed removed It seems like something here isn't cleaned up correctly. Anyways, restarting appdaemon manually fixes this.

Now introducing my 50 realworld apps into the setup and appdaemon CPU usage settles at around ~43-55% I did try to look at the "AppDaemon Administration". I see which apps have what callbacks and so forth. It's a bit hard to troubleshoot this. I've attached the static HTML for the admin interface, if you want to browse the results I got.

I also tried to add one app at a time, restarting appdaemon each time. The CPU footprint seems to increase as I add more apps. Can't seem to find one specific app. It seems like the more apps, the more the CPU is hogged.

What can I do from here ? Does each callback come with a small cpu penalty ? I mean, should these be kept at a minimum. This is really the heart of appdaemon, so reducing thus would probably be tough. Should I refactor my apps for the async-await pattern ? what this have any benefit ?

ReneTode commented 4 years ago

appdaemon checks if apps (the files from the apps) have changed every second. the more apps you add, the more files and code gets checked. thats the only logical conclusion for extra CPU, for an app that has no returning callbacks

allthough its way more in your case then i expect, because i run over 600 apps on my system. with 2x HA, 5x AD and mosquito, i hardly get up to 30% CPU. (but i got a beebox)

i also suspect that some python stuff isnt cleaned correctly, or it takes a long time, when i look at memory use.

what you at least can do is check out the option for production mode. and make sure you got AD 4.0.5 because earlier version had a problem with callback entities that were not correctly removed.

gibman commented 4 years ago

I think an AMD e-450 is around the same as a beebox CPU wise (except 2 cores vs 4 cores).

I tried setting production_mode: True .. It did help in bringing down the CPU usage to ~20-42% (mostly 30%)

I'm on version 0.3.0 which should have AD 4.0.5, so all good there atleast.

Anything else I can do to further improve this ?

ReneTode commented 4 years ago

it all depends on your apps.

if you got an app that creates or reads a sensor every second, then get that to 10 or 30 seconds. if you got apps that use a lot of requests, then bring that down as much as reasonable.

i cant say what is causing your CPU usage without studying your apps. if i would try to find the problem (if it would be a problem) then i would start by installing a second AD instance. then move 1 app there, see how much CPU its takes, moved it back, do another, etc. untill i know which apps uses up most CPU and then see if there is a way to improve those.

ReneTode commented 4 years ago

but what i also noticed is that sometimes its not even enough to restart AD and HA, and i need to restart my complete ubuntu device, because stuff keeps hanging in memory/ cpu use. and i suspect thats some kind of python imperfection.

ConsoleCriminal commented 4 years ago

Do you have the MQTT module enabled and lots of updates there? I had the same problem. What actually helped was setting client_topics: None in appdaemon.yaml. (My mqtt apps subscribe to the individual topics they require themselves). By default, appdaemon seems to subscribe to "#", which seems to cause that appdaemon needs to filter tons of irrelevant messages....

gibman commented 4 years ago

Do you have the MQTT module enabled and lots of updates there? I had the same problem. What actually helped was setting client_topics: None in appdaemon.yaml. (My mqtt apps subscribe to the individual topics they require themselves). By default, appdaemon seems to subscribe to "#", which seems to cause that appdaemon needs to filter tons of irrelevant messages....

good advice ! although I'm already running with a small subset of client_topics defined in the appdaemon.yaml file :)

regards gibman

ReneTode commented 4 years ago

I had the same problem. What actually helped was setting client_topics: None in appdaemon.yaml. (My mqtt apps subscribe to the individual topics they require themselves). By default, appdaemon seems to subscribe to "#", which seems to cause that appdaemon needs to filter tons of irrelevant messages....

you dont need to set it to None. just specifying what you want is the same as subscribing in the apps.

and it also depends on what other stuff you do with MQTT. for me there are no irrelevant mqtt messages in AD, because everything in MQTT is used in AD.

bbrendon commented 3 years ago

@gibman For what AD does it is a bit of a CPU hog. No idea why. Most of the hogging was done after upgrading from Appdaemon 3 to 4. I find the CPU usage tolerable. I think it uses about 10-15% on my i3. It definitely feels "heavy" for what it does.

ReneTode commented 3 years ago

enable productionmode and AD will use way less CPU (which is only logical)

sandervandegeijn commented 3 years ago

The combination of not subscribing to all mqtt topics and production mode more than halved my cpu usage. That being said, would an implementation like pypi or jython be an option to further boost performance?

ReneTode commented 3 years ago

That being said, would an implementation like pypi or jython be an option to further boost performance?

i really dont see why jython would make AD perform better. AD is written in python not in javascript.

and i dont even know how you would want pypi implemented. as far as i know pypi is for distrubiting packages.

sandervandegeijn commented 3 years ago

I'm sorry, I meant pypy, different animal. Jython isn't written in javascript, it's a python runtime written in java.

sandervandegeijn commented 3 years ago

Those suffering from performance issues, you can use pypy with appdaemon it seems. Been running for an hour now and I'm seeing some improvement. It takes some time to settle down, probably because the jit compiler has to have time to compile the most heavy parts. It is heavier on memory though. Check out the code from github and use this Dockerfile:

FROM pypy:3

# API Port
EXPOSE 5050

# Mountpoints for configuration & certificates
VOLUME /conf
VOLUME /certs

# Copy appdaemon into image
WORKDIR /usr/src/app
COPY . .

# Install timezone data
RUN apt-get update && apt-get install -y tzdata

# Install dependencies
RUN apt-get install -y build-essential libffi-dev libssl-dev musl-dev cargo \
    && pip install --no-cache-dir .

# Install additional packages
RUN apt-get install curl

# Start script
RUN chmod +x /usr/src/app/dockerStart.sh
ENTRYPOINT ["./dockerStart.sh"]

Build:

docker build -t myappdaemon .

Run:

docker run -d --name appdaemon -v /opt/appdaemon/conf:/conf --restart always myappdaemon

ReneTode commented 3 years ago

Those suffering from performance issues

go and check if your apps are written in the way AD is intended, and if you run stuff that is keeping AD busy.

because normally AD doesnt have performance issues, not even up to 500 or 600 well written apps. 1 run 7 AD instances with over 750 apps on 1 device. and my CPU isnt even at 50%

ReneTode commented 3 years ago

and i just did a lookup from pypy, but a general use from pypy isnt wanted, because then it would be impossible to run AD in a python venv. off course everyone can use it at his own wishes.

haraldhammarsving commented 1 year ago

Hi, I have just moved from an old HA (0.86?) and AD 2.X to the most resent HA and AD 4.2.2. I have dashboard and some automation scripts (reports 74) running in AD, same things as before in old installation. I have made some corrections due to changes in syntax, but nothing else has really changed. I am running HA and AD in separate python venv:s on a Raspberry Pi 3B. I am seeing a constant 15-20% cpu usage for AD and say 2% for HA. I do not recall seeing that high cpu usage for AD on my old installation. I have tried setting "production_mode: true", but no real change can be seen. I have some weather data from YR, some local termometers and solar panel data on the display. The scripts, I believe, should in general be just waiting for triggers, controlling lights at sundown or homecoming. Is this really expected ? I now also see an alarming high usage of memory for AD. I have restarted AD very resently ,but I will do a complete reboot to see if this is a repeating behaiviour,

bbrendon commented 1 year ago

Hi @haraldhammarsving

Is this really expected?

Yes, I would say so. Same thing happened to me. Mine uses about 19-26 % according to top. I have brought this up a few times with devs but the wheel doesn't squeak loud enough for this to get attention. I would agree something is wrong somewhere.

haraldhammarsving commented 1 year ago

Ok, it happend again AD suddenly starts to eat memory too, and soon hangs the entire Raspberry. Upgrading to this new version starts to feel like it created more problems than it solved. Any tips or tricks to stop this from happening ?

Justihar commented 1 year ago

To be honest, while there is probably a lot of things that can be done to make AD more efficient, I don't think it'll happen anytime soon and will likely be a long and incremental process. A Raspberry Pi is pretty anemic as is so you may want to consider running it on more robust hardware. I've moved my HA and AD server (which also runs various databases, zigbee2mqtt, zwavejs2mqtt, etc) twice now but mostly due to all the various apps and not just AD. The Raspberry Pi is a good place to get started but let's be real, it is not a good piece of hardware for production unless you have some elaborate setup with them in a cluster.

haraldhammarsving commented 1 year ago

The Raspberry Pi is a good place to get started but let's be real, it is not a good piece of hardware for production unless you have some elaborate setup with them in a cluster.

I have been running the same automations, running stable for over 2 years on the same hardware. What I am not understanding is why an update from AD 2.x to AD 4.2.2 (and no other change to scripts or dashboards) should have a much higher load on the system. AD process in idle ( most of the time just in listen_state mode) has been really low running 2.X version. Why should it really be so much higher for version 4.2.2 ?

The memory leaks I experienced might (hopefully) not be happening all the time ( I have too small statistic here ), but here some other hardware will not help. Eventually all memory will still be "eaten".

haraldhammarsving commented 1 year ago

Still a little high on CPU usage, I think, but the second problem with the memory getting "eaten", is suddenly gone without any change to the setup. I don't understand this, but great! Last week, it ran less than an hour before it had taken 50% of the memory and everything got frozen. Now it has been running with approx 6% memory alloc for almost a week.

ReneTode commented 1 year ago

if your memory problem returns, shutdown the admin interface. that took away the problem for me.

bbrendon commented 1 year ago

@acockburn I don't think this was ever resolved. I would still consider appdaemon a CPU hog. It uses the most CPU on my system. 4x more than hass, 10x more than mysql.

acockburn commented 1 year ago

OK, I closed this mainly because the AUI has been rewritten, and I was looking at the latest entries in a mass issue closing session.

However, "AD is a CPU hog" is not really actionable, not least because the OP was running 50 apps on a Raspberry PI which I know from experience is not the most performant of machines, plus, there could be anything in those 50 apps causing CPU usage, literally anything, so I have no idea where to start.

If an empty AD running no apps takes a lot of CPU that's a start, however that doesn't;t seem to be an issue. The post itself says that as you add more apps the CPU usage goes up ... well I would expect it to, apps aren't free, each one will add overhead to AD as it has ongoing checks for every thread, and additional work keeping the AUI up to date, and that's before we even get started on what the app itself is actually doing. In fact it may well have been the introduction of the AUI in 4.0 that bumped the CPU up as you observed.

So what can I do with this? @bbrendon as a long time user and moderator in our discord channel, I respect your judgement and will look at this again if you can give me something to get me started. For what it's worth, on my system I am running 30 apps and my ad cpu usage is 2 - 5%, hass is in the 1 - 2 % range. Does that prove anything ? Not really. About all I could really do is take a look at general performance and try and optimize everything, maybe I'll take a look at that at some point, but without anything concrete to start with I am at a loss for any "fix" for this. As you can see from my numbers above, it isn't a problem for me, I am not getting a huge number of reports about this, certainly not enough to see any pattern.

Perhaps you could start by letting me know what hardware you are running on,. and how many apps y have and the kind of things they are doing? Also which parts of AS yo are using - AUI, HTTP, Dashbboards?

bbrendon commented 1 year ago

@acockburn Yes. I'll agree this is not really actionable at this time because there is not enough data, the wheel isn't squeaking enough, and although a few people noticed something, it's not like it's serious.

Even though my experience is primitive in programming, I would expect the time to improve or fix the CPU usage would be considerable.

As for what I use which may give some idea. I only use the python apps. I don't use dashboard or GUI. I do have the GUI enabled but I think I've accessed it maybe three time in many years.

The CPU usage is less noticeable on fast computers. I had an older i3 computer and I remember AD using 20% ish. Over this weekend I upgraded to a i3-8100. AD doesn't seem to go above 6% now.

I'll try to spend some time collecting something more useful in the future.

sandervandegeijn commented 1 year ago

Are you already on python 3.11? Should be a lot faster than the older pythons :)

acockburn commented 1 year ago

Are you already on python 3.11? Should be a lot faster than the older pythons :)

True - we are adding support for 3.11 in the next release, although in truth that's just a statement of support since it already works well and I am running an developing on it.

@bbrendon - thanks for the info, I will keep this in mind moving forwards and look for optimizations, and potential ways to improve the code. I did do some performance profiling in the early days and the code is still in AD for that so maybe I can dust that off as well, I'll have. quick look.

Regarding your setup - that does help zero it down a little. The AUI added some overhead for sure and if people aren't using it maybe we can be smarter about not using the cycles.

gibman commented 1 year ago

I've had some recurring issues of appdaemon losing the connection to HASS. This was evident in the logs.

2023-04-02 06:15:06.280458 WARNING HASS: Disconnected from Home Assistant, retrying in 30 seconds 2023-04-02 06:15:11.076261 WARNING HASS: Attempt to call Home Assistant while disconnected: call_plugin_service 2023-04-02 06:15:36.631166 INFO HASS: Connected to Home Assistant 2023.3.6 2023-04-02 06:15:36.709308 INFO HASS: Evaluating startup conditions 2023-04-02 06:15:36.716660 INFO HASS: Startup condition met: hass state=RUNNING 2023-04-02 06:15:36.717370 INFO HASS: All startup conditions met 2023-04-02 06:15:36.979300 INFO AppDaemon: Processing restart for HASS

then all apps would re-init. then again 'Disconnected from Home Assistant'..

It would never get out of this loop.

Normally I could fix it by making sure production mode was turned on. Obviously saving CPU cycles. But suddenly I couldn't.

Ok back to basics.

I started with a simple helloworld.py and everything would work. I then added one .py module at a time.

Slowly but surely the CPU utiliz. % would rise as I added more modules. At some point the appdaemon restart loop re-occurred.

I found no culprit module. It seemed more like a case of appdaemon had a hard time keeping the connection to HASS when adding a lot of modules/app instances. If appdaemon gets too busy there is a risk you will see this problem.

I currently have 424 app instances and 64 modules (py files).

I solved it here locally. So what I found was that I had been running with intel_pstate max_perf set to 0. Essentially running myc NUC @800mhz instead of 2.8ghz This was done in order to save on watt. I thought My NUC was way overkill running at max or dynamic p-states wo why not save some watts. wrong!!

snippet from my configuration.yaml command_on: "echo \"100\" | tee /sys/devices/system/cpu/intel_pstate/max_perf_pct" command_off: "echo \"0\" | tee /sys/devices/system/cpu/intel_pstate/max_perf_pct"

I suspect the problem of seeing the "appdaemon losing the HASS connection then restarting all over again" loop issue could be experienced on less capable hardware as well.

a raspberry pi with many appdaemon apps could probably produce this issue.

So if you see this issue, try and upgrade your hardware or look into what cpu power saving features you have enabled and try and disable those.

One could argue that appdaemon should not cause this problem on even less capable hardware.

acockburn commented 1 year ago

Thanks for the thorough writeup @gibman.

If there is an issue here to fix, we can gather more information on the disconnections by running in DEBUG mode to capture the actual reason for the disconnect - I would be interested in seeing that.

It is definitely the case that the more Apps you add the more CPU will be used - AD has additional overhead per app, not even counting the CPU for whatever the App itself is doing which is obviously outside of AD's control. 424 apps is an impressive number :) I suspect you are hitting some kind of system limit that was also increased when you adjusted your CPU.

We will be looking at a more efficient HASS plugin at some point, and maybe that will also help.

gibman commented 1 year ago

@acockburn Thanks for replying!

I should be able to repro it easily if you want me to. Although, how do I start appdaemon in DEBUG mode ?

and what files / logs do you want me to attach ?

Looking forward to the upcoming version.

acockburn commented 1 year ago

Just start AD with "-D DEBUG" added to the command line. Then you should see much more output in the regular log, including exceptions describing the root cause for the disconnection.

bbrendon commented 1 year ago

Just ran across this. I wonder if something like this can be done in appdaemon? https://www.home-assistant.io/integrations/profiler

gibman commented 1 year ago

No idea if this is the correct way to get the proper debug logs.

I used this config here for appdaemon: system_packages: [] python_packages: [] init_commands: [] log_level: debug

I got this "WARNING HASS: Attempt to call Home Assistant while disconnected: set_plugin_state" in the appdaemon.log file:

2023-08-15 00:02:37.999925 DEBUG AppDaemon: sending event locally 2023-08-15 00:02:38.000239 DEBUG livingroom_presence_hold: set state: binary_sensor.livingroom_presence_holding, {'state': 'on', 'attributes': {'created_by': '/config/appdaemon/apps/pir_hold_async.py', 'last_changed': '2023-08-15T00:02:37', 'device_class': 'occupancy', 'friendly_name': 'Stue tilstedesensorer (1 min)'}} from livingroom_presence_hold 2023-08-15 00:02:38.000425 DEBUG AppDaemon: set_state(): binary_sensor.livingroom_presence_holding, {'state': 'on', 'attributes': {'created_by': '/config/appdaemon/apps/pir_hold_async.py', 'last_changed': '2023-08-15T00:02:37', 'device_class': 'occupancy', 'friendly_name': 'Stue tilstedesensorer (1 min)'}} 2023-08-15 00:02:38.000636 DEBUG AppDaemon: parse_state: binary_sensor.livingroom_presence_holding, {'state': 'on', 'attributes': {'created_by': '/config/appdaemon/apps/pir_hold_async.py', 'last_changed': '2023-08-15T00:02:37', 'device_class': 'occupancy', 'friendly_name': 'Stue tilstedesensorer (1 min)'}} 2023-08-15 00:02:38.000867 DEBUG AppDaemon: Old state: {'entity_id': 'binary_sensor.livingroom_presence_holding', 'state': 'on', 'attributes': {'created_by': '/config/appdaemon/apps/pir_hold_async.py', 'last_changed': '2023-08-15T00:02:37', 'device_class': 'occupancy', 'friendly_name': 'Stue tilstedesensorer (1 min)'}, 'last_changed': '2023-08-15T00:02:37+02:00', 'last_updated': '2023-08-14T21:59:36.298359+00:00', 'context': {'id': '01H7V0ZHKAP4R5QDA9A6X7KRJY', 'parent_id': None, 'user_id': '6516935a1fb9462f973aa20154f5ab18'}} 2023-08-15 00:02:38.001057 DEBUG AppDaemon: New state: {'entity_id': 'binary_sensor.livingroom_presence_holding', 'state': 'on', 'attributes': {'created_by': '/config/appdaemon/apps/pir_hold_async.py', 'last_changed': '2023-08-15T00:02:37', 'device_class': 'occupancy', 'friendly_name': 'Stue tilstedesensorer (1 min)'}, 'last_changed': '2023-08-15T00:02:38+02:00', 'last_updated': '2023-08-14T21:59:36.298359+00:00', 'context': {'id': '01H7V0ZHKAP4R5QDA9A6X7KRJY', 'parent_id': None, 'user_id': '6516935a1fb9462f973aa20154f5ab18'}} 2023-08-15 00:02:38.001203 DEBUG AppDaemon: sending event to plugin 2023-08-15 00:02:38.001341 WARNING HASS: Attempt to call Home Assistant while disconnected: set_plugin_state 2023-08-15 00:02:38.001563 DEBUG AppDaemon: Event type:state_changed: 2023-08-15 00:02:38.001698 DEBUG AppDaemon: {'entity_id': 'sensor.threads_current_busy', 'new_state': {'entity_id': 'sensor.threads_current_busy', 'state': 1, 'last_changed': '2023-08-15T00:02:37+02:00', 'attributes': {}}, 'old_state': {'entity_id': 'sensor.threads_current_busy', 'state': 2, 'last_changed': '2023-08-15T00:02:37+02:00', 'attributes': {}}} 2023-08-15 00:02:38.001837 DEBUG AppDaemon: process_event_callbacks() admin {'event_type': 'state_changed', 'data': {'entity_id': 'sensor.threads_current_busy', 'new_state': {'entity_id': 'sensor.threads_current_busy', 'state': 1, 'last_changed': '2023-08-15T00:02:37+02:00', 'attributes': {}}, 'old_state': {'entity_id': 'sensor.threads_current_busy', 'state': 2, 'last_changed': '2023-08-15T00:02:37+02:00', 'attributes': {}}}}

If I have to start appdaemon in some other way in order to get more details then I need to know how. I use the hass OS on a x86 box and I dont know how to tweak the docker setup for appdaemon if such is needed.

AppDaemon / appdaemon

CPU performance issues with appdaemon 4.x #1030

Add-on: AppDaemon 4 Python Apps and Dashboard using AppDaemon 4.x for Home Assistant