AppDaemon / appdaemon

:page_facing_up: Python Apps for Home Automation
Other
843 stars 418 forks source link

Missing information about app in "Excessive time spent in utility loop" warning #834

Closed wsw70 closed 3 years ago

wsw70 commented 4 years ago

I see here and there Excessive time spent in utility loop warnings:

2020-01-05 13:51:51.275486 WARNING AppDaemon: Excessive time spent in utility loop: 979.0ms

Depending on the app, this may or may not be a problem:

The problem: there is no context in the warning, one does not know which app is affected.

EDIT: I found a setting which seems to be useful (check_app_updates_profile) but I still do not see the app which is affected, though more info is output:

2020-01-05 20:16:36.055782 WARNING AppDaemon: Excessive time spent in utility loop: 903.0ms
2020-01-05 20:16:36.056037 INFO AppDaemon: Profile information for Utility Loop
2020-01-05 20:16:36.056143 INFO AppDaemon:          683 function calls (679 primitive calls) in 0.893 seconds
   Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       17    0.564    0.033    0.564    0.033 {built-in method io.open}
     10/6    0.000    0.000    0.143    0.024 /usr/local/lib/python3.6/os.py:277(walk)
        1    0.000    0.000    0.122    0.122 /usr/local/lib/python3.6/site-packages/appdaemon/appdaemon.py:1706(check_config)
        1    0.000    0.000    0.122    0.122 /usr/local/lib/python3.6/site-packages/appdaemon/appdaemon.py:1637(check_later_app_configs)
       27    0.000    0.000    0.119    0.004 /usr/local/lib/python3.6/genericpath.py:53(getmtime)
       28    0.119    0.004    0.119    0.004 {built-in method posix.stat}
       66    0.117    0.002    0.117    0.002 {built-in method builtins.next}
       17    0.066    0.004    0.066    0.004 {method 'close' of '_io.TextIOWrapper' objects}
        4    0.015    0.004    0.015    0.004 {built-in method posix.scandir}
        2    0.000    0.000    0.010    0.005 /usr/local/lib/python3.6/posixpath.py:168(islink)
        2    0.010    0.005    0.010    0.005 {built-in method posix.lstat}
       46    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/posixpath.py:75(join)
       17    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/_bootlocale.py:23(getpreferredencoding)
       46    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/posixpath.py:41(_get_sep)
       17    0.000    0.000    0.000    0.000 {built-in method _locale.nl_langinfo}
       17    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/codecs.py:308(__init__)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/genericpath.py:27(isfile)
       46    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
       17    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/codecs.py:259(__init__)
       62    0.000    0.000    0.000    0.000 {method 'is_dir' of 'posix.DirEntry' objects}
       50    0.000    0.000    0.000    0.000 {built-in method posix.fspath}
       89    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
       46    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
       46    0.000    0.000    0.000    0.000 {method 'endswith' of 'str' objects}
        2    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/site-packages/appdaemon/appdaemon.py:1648(<listcomp>)
        2    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/site-packages/appdaemon/appdaemon.py:1915(<listcomp>)
        2    0.000    0.000    0.000    0.000 {built-in method _stat.S_ISLNK}
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/site-packages/appdaemon/appdaemon.py:1836(process_filters)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
ReneTode commented 4 years ago

the warning doesnt give more info about the app, because its not just 1 particular app that makes that causes it.

AD has a main loop. in that loop all kind of stuff happens. for example is checked if apps are updated, changed, added or deleted. at the moment that you reinitialise an app that has a lot of depending apps, you can imagine that python needs more time to get through that loop.

when everything in total that is done in the loop is taking longer then expected you get that warning. so that can be caused by a lot of different things. in AD 4 they happen less and the message is a bit more split out.

still they happen. and in most cases they are just telling you that you need to optimise your apps.

edit: and remember that even when 100 apps all take 0.01 second at the same time, you also will get over the 1 second threshhold. so you cant tell if its 1 app, or 50 that used up the time.

acockburn commented 4 years ago

What Rene said. In addition, If any particular apps are spending excessive time in a particular callback, AD4.0 will log that for you.

JuliusLedoux commented 4 years ago

What it is really interesting to me, is that I just have a vanilla installation, not a single application, just the Hello Word one and I still get this warning.

ReneTode commented 4 years ago

@JuliusLedoux i had the same problem years ago. RPI with SD card??

if so then probably your SD card is not allright anymore.

JuliusLedoux commented 4 years ago

@ReneTode I am using a fresh installation on Docker (Ubuntu - AMD64).

ReneTode commented 4 years ago

I understand. but i am pretty sure that its not the install that is bugging you, but your hardware.

to check if that is the case, then start AD in production mode. it wont check the apps dir every second in production mode, and if you dont have the warnings in that case, then you know you got an IO problem.

other programs that also use IO heavy can cause trouble too. for example, HA with a very big history DB can put a big strain on your IO.

when the IO is to busy, AD cant check the files within a second and that causes the warnings.

katonage commented 3 years ago

I also have problem with this. I use rflink covers with somfy rts. They seem a bit laggy from ha too. Is it possible that quarrying state of these causes the warning? Alternatively l subscribed to two events for 4 covers, maybe these event handling checks cause too much internal filtering?

ReneTode commented 3 years ago

it all depends on what part of the loop the time is spend. in most cases it is an I/O problem, but i can only tell if i see the actual warnings.

katonage commented 3 years ago

Thank you, for dealing with the issue! Log from the AppDaemon terminal: (bottom-up)

2021-01-12 21:39:32.029107 INFO AppDaemon: thread-1 done 2021-01-12 21:39:32.019954 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:39:27.029425 INFO AppDaemon: thread-1 done 2021-01-12 21:39:27.020076 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:39:25.080158 WARNING AppDaemon: Excessive time spent in utility loop: 2406.0ms, 14.0ms in check_app_updates(), 2392.0ms in other 2021-01-12 21:39:22.022364 INFO AppDaemon: thread-1 done 2021-01-12 21:39:22.014614 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:39:17.068070 INFO AppDaemon: thread-1 done 2021-01-12 21:39:17.061413 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:39:17.050040 INFO AppDaemon: thread-1 done 2021-01-12 21:39:17.020610 INFO AppDaemon: thread-1 calling scheduler callback cover_auto_behavior() in cover_control 2021-01-12 21:38:32.030348 INFO AppDaemon: thread-1 done 2021-01-12 21:38:32.020005 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:38:27.028584 INFO AppDaemon: thread-1 done 2021-01-12 21:38:27.021099 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:38:22.023868 INFO AppDaemon: thread-1 done 2021-01-12 21:38:22.016583 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:38:17.078590 INFO AppDaemon: thread-1 done 2021-01-12 21:38:17.071088 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:38:17.056494 INFO AppDaemon: thread-1 done 2021-01-12 21:38:17.021694 INFO AppDaemon: thread-1 calling scheduler callback cover_auto_behavior() in cover_control 2021-01-12 21:37:40.225123 WARNING AppDaemon: Excessive time spent in utility loop: 8039.0ms, 16.0ms in check_app_updates(), 8023.0ms in other 2021-01-12 21:37:32.027229 INFO AppDaemon: thread-1 done 2021-01-12 21:37:32.018882 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:37:27.027656 INFO AppDaemon: thread-1 done 2021-01-12 21:37:27.018628 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:37:26.046477 WARNING AppDaemon: Excessive time spent in utility loop: 3601.0ms, 16.0ms in check_app_updates(), 3585.0ms in other 2021-01-12 21:37:22.026041 INFO AppDaemon: thread-1 done 2021-01-12 21:37:22.016921 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:37:17.070087 INFO AppDaemon: thread-1 done 2021-01-12 21:37:17.063613 INFO AppDaemon: thread-1 calling scheduler callback gentle_set_cover_state_async() in cover_control 2021-01-12 21:37:17.051861 INFO AppDaemon: thread-1 done 2021-01-12 21:37:17.023025 INFO AppDaemon: thread-1 calling scheduler callback cover_auto_behavior() in cover_control

here cover_auto_behavior() is called every minute. This calls gentle_set_cover_state_async() with .run_in for all for covers (with 5 sec delays) these check that covers are in the desired state and exit shortly. Still, in irregular times I get the warning although no intensive calculations are ongoing. No tampering of threading or pinning has been done.

secrets: /config/secrets.yaml appdaemon: latitude: 47.4979 longitude: 19.0402 elevation: 2 time_zone: Europe/Budapest api_port: 5050 log_thread_actions: 1 plugins: HASS: type: hass

namespaces: StorageNamespace: writeback: hybrid

http: url: http://127.0.0.1:5050 admin: api: hadashboard:

ReneTode commented 3 years ago

i see. you use async to call services that take to much time. async runs in the same loop as the AD core. to avoid this trouble, write the app in sync if thats in any way possible.

if you dont then all other apps that you want to write in the future will be effected by this app.

Mithras commented 3 years ago

@ReneTode I'm confused why would a service call block the thread. Aren't service calls async?

katonage commented 3 years ago

I'm not actually using async calls. I tried workarounding the issue making these async, but ran into issues - due to my lack of understanding here - so I returned to standard methods, the "async" remained only in the name of the function.

Just guessing: One point others might not use I querry state of the covers in a 'shadow' entity in a custom namespace. I access it with get_state.

I find interesting in this issue the fact, that the warning is seemingly trigggered between the calls.

I use a standard RPI4 hardware with an SD card and RFlink-somfy and a few shelly.

I also tried run_in_executor, but .log() seemed not working there and it is not documented properly, so finally i avoided that too.

ReneTode commented 3 years ago

please share your app, so that i can see what you are doing. i suspect that you dont set proper timeouts.

ReneTode commented 3 years ago

@ReneTode I'm confused why would a service call block the thread. Aren't service calls async?

services can take up a longer time. i see a lot of people having trouble with service call that take up to long in sync. are they async? i dont know. they return a value in the same CB, and not in a CB you can set, like in async. there is a lot of sync and async mixture inside AD, which for me makes it impossible to say if something is async or not. if a function works both in a sync and an async app, what would you call it? sync or async? i would say sync, and for async there should be a different function. but in AD there are sync and async functions with the same name. so then you need to start digging in the AD code to see if the function is correct for both ways. and im not knowledged enough for that.

remember, im not a real python dev either. i know a lot about AD (way more then about python) but i never took the time to really get deep into AD code (certainly not after the second refactoring) most of my experience/knowledge comes from supporting for over 5 years and a logical brain and a lot of discussions with Andrew and Odia in those years.

katonage commented 3 years ago

cover_control.py.txt

Thats a lenghty file, but shortly: I have 4 covers. I set up a class to handle them in 4 instances (line20). These listen to two events each to follow state (line123) (these are rarely fired). The cover_auto_behavior() is called minutely, this finds out desired cover state (line34). With the desired state, calls all covers gentle_set_cover_state() 4 times (line77) These gentle_set_cover_state() functions rethrow a gentle_set_cover_state_async() (tchnically not an async function) a few seconds later (line211) just to jitter call to the covers. Howevwer all ends up in nothing as gentle_set_cover_state_async() returns shortly (line228) because covers are already at their desired state. All these happens in a few millisecs according to the log above. Still I get the warnings between the calls to my user code.

Mithras commented 3 years ago

if a function works both in a sync and an async app, what would you call it? sync or async?

I would call it a hack. I call something async only if instead of doing Thread.Wait() the thing is capable of registering continuation/callback of some sort. In ideal world IO should NEVER be sync. It's async by design. There is no valid reason to waste a thread (by doing thread.wait() or something of the sort) while waiting for http response or whatever else IO. That thread could do something useful instead.

ReneTode commented 3 years ago

ok @Mithras lets agree to disagree. i dont think we will come anywhere in this discussion. your clearly so pro async that you want everything in async, and i have a hard time talking to people that blindly look in 1 direction.

@katonage i think you made something very complicated to do a few simple actions. and you dont use app structures. i love to talk to you how you could simplefy and optimise this app, so that your AD has less trouble. but not here. for that i am available on discord.

but the most important thing is that you use AD functions outside an app. the class SomfyHandler isnt an app, because an app 1) always has yaml 2) always has an initialise

using AD functions outside apps might work, but is in no way supported. because the functions were not build for that.

the other thing i noticed is that you probably could replace your run_minutely by a few run_at_sunset/sunrise with enough offset. and that would make that you wont stress AD that much.

i (and others) are here https://discord.gg/pHsjADY all the time, and discussing about better/easier ways, is easier there.

Mithras commented 3 years ago

@ReneTode It's ok to disagree. It's not about pro async though. It's about not wasting threads. Async is just a tool that makes it possible.

There is no valid reason to waste a thread (by doing thread.wait() or something of the sort) while waiting for http response or whatever else IO.

You can argue with "but it's more newbies friendly" or "but the code look simpler" but it doesn't change the fact that blocking threads in order to wait for IO is sub-optimal at best.

Mithras commented 3 years ago

you want everything in async

This is actually so very true. And this is the main problem with async. Nothing good usually happens from mixing sync and async so yes, usually everything has to be async in order to avoid weird bugs/deadlocks in random places. There are languages out there with pretty much 100% async BCL (Base Class Library) but Python will probably take years before that's the case. It'll eventually get there though. It's inevitable.

ReneTode commented 3 years ago

but it doesn't change the fact that blocking threads in order to wait for IO is sub-optimal at best.

only when you come to the point where you dont have enough threads. (which isnt the case in 99.99% of all cases)

Nothing good usually happens from mixing sync and async so yes, usually everything has to be async in order to avoid weird bugs/deadlocks in random places. python was good when there was no async (if it wouldnt have been it wouldnt have become so popular) so i guess i agree with you. adding async to python was a bad idea ;) and that caused that async was also added to AD.

sync programming did work even in the earliest days. now we have devices that are 100.000 times better and faster. i really dont see the need to optimise that, because we rarely use our devices even near the max.

Mithras commented 3 years ago

only when you come to the point where you dont have enough threads. (which isnt the case in 99.99% of all cases)

It's the same as saying it's ok to waste memory as long as there is enough. If everybody thinks this way, you'll run run out memory pretty fast. Each thread adds some overhead. You need 1Mb RAM for stack and you waste CPU time switching contexts between threads (which probably also leads to cache misses). The whole reason behind async is to address that.

sync programming did work even in the earliest days. now we have devices that are 100.000 times better and faster. i really dont see the need to optimise that, because we rarely use our devices even near the max.

Indeed. Languages came a long way since then. Today you can be fully async with almost no extra cost. In early days it would explode you codebase size/complexity by orders of magnitude so it wasn't really an option back then.

ReneTode commented 3 years ago

It's the same as saying it's ok to waste memory as long as there is enough. If everybody thinks this way, you'll run run out memory pretty fast. Each thread adds some overhead. You need 1Mb RAM for stack and you waste CPU time switching contexts between threads (which probably also leads to cache misses). The whole reason behind async is to address that.

you forget about the goal. the goal is an EASY to use program that most users only run on an environment where memory and threads are really not an issue. forcing everyone to use async is like telling everyone that they can only sing in the shower when they can sing all notes correctly. (even though noone hears it) no matter how you look at it async is never EASY! because it asks a way of thinking that is not linear(logical). if async was that easy there wouldnt be so much flawd libs out there.

i dont say that AD should be for the same level of noobs as HA directs itself to, but it should be understandable for starting programmers.

so yeah, its ok to waste memory on user level. only experienced users should be the ones that have concern about it. and yeah its ok to waste CPU time on user level, as long as you dont know how to manage that and yeah its ok to waste threads as long as there is no problem. at the moment that you run into trouble, you probably already learned how to optimise, or at least enough so that you have the ability to learn.

you will never convince me that async is the only good way. because people who think that only 1 tool is the best tool are blind and deaf. and its almost impossible to communicate with those people. so we better stop this (off topic) discussion

katonage commented 3 years ago

@katonage i think you made something very complicated to do a few simple actions. and you dont use app structures. i love to talk to you how you could simplefy and optimise this app, so that your AD has less trouble. but not here. for that i am available on discord.

Thank you, I'll contact you there, I'm open for suggestions.

Looking your conversation with Mithras: is it somewhere decided, what niche of users AD should serve, and which of the concurrent approaches are favoured by the big thinkers? For example, I'm OK with automations and blueprints, but im not interested in them, these simple stuff can be done usually with the manufacturer's stuff. Im interested in a flexible python coding. On the other hand Im not the tough guy to dig into the source, and code a custom integration for my cover control. AD is for the middle-techies like me. In such a landcape it makes sense to argue if the niche aimed needs or can handle async stuff or not? If AD should be more integrated into the HA core or be some versatile general purpose toolset? If it will be an official addon or even integration one day?

ReneTode commented 3 years ago

is it somewhere decided, what niche of users AD should serve

AD is created for people who want to create simple automations with python code instead of with the simplistic yaml options that HA offered over 5 years ago.

I'm OK with automations and blueprints, but im not interested in them, these simple stuff can be done usually with the manufacturer's stuff

sorry but i dont have a clue which manufactorers you are thinking of.

If AD should be more integrated into the HA core or be some versatile general purpose toolset? If it will be an official addon or even integration one day?

at the start AD was written just to work with HA. and at some point there was the plan to make AD closer to HA. but AD evolved into a program that can be used on its own, completely without HA. it also can work with other programs that are like HA if they have an api. all that is needed in that case is another plugin. there is an official addon (written by and maintained by frenck, independent from AD) AD most likely will never be an intergration (there might be a better plugin at some point)

AD is very versatile. it can be used in a lot of different ways. lots of "pythonic" ways of doing were added over the years. and thats also why async apps were added. because people wanted it in.

the original description still isnt changed though, because thats what AD still is. https://appdaemon.readthedocs.io/en/latest/index.html

i hope that tells you more about the background from AD.

Mithras commented 3 years ago

@katonage brought a good point. HA already has a non-dev friendly mechanism to write automations. Why is AD targeting the same audience? IMO, somebody who is not willing to spend a couple evenings getting the gist of async, should just stick with the native HA automations.

wsw70 commented 3 years ago

Just chiming in with a few comments

@ReneTode

AD is very versatile. it can be used in a lot of different ways.

I use AD not only as an automation engine, but also as some kind of "python code executor" for tasks unrelated to AD (but related with general home automation). So the fact that it is a versatile tool is a big plus.

@Mithras

IMO, somebody who is not willing to spend a couple evenings getting the gist of async, should just stick with the native HA automations.

I am an amateur developer for 20 years (I work in IT but code for my pleasure, and kernel development in Linux in the 90's to make my computer work with that new OS).

Among others, I absolutely love the async nature of JavaScript in the browser and I am an ardent fan of Vue.js (and its asynchronous way of thinking).

I also appreciate very, very much bus-driven development, I use MQTT (or RabbitMQ) for so many programs (besides home automation) that the asynchronous nature of the events grew to an advantage and not a drawback.

I use threading in Python for years for various programs. I like and understand it at a reasonable level. I never managed to get Python's new async libraries right. I spent literally hours trying to find the right tutorial and somehow that part never managed to find a free cell in my brain.

Forcing AD users to embrace the async part would be horrible. Not only there is an extensive state of mind to get when doing async code (this requires real commitment), and the libraries are, at lest for some population of 1, not understandable.

This would remove a large population from using AD.

EDIT: I now realized that it is me that actually opened the issue. The issue itself is solved, the discussion is worthwhile but beyond the scope of my question. I let @acockburn close it when needed because I do not want to hamper a discussion (and yet, this is not the topic of my question)

katonage commented 3 years ago

Digging into my actual problem... If you check my debug lines, I got: "Excessive time spent in utility loop: 3601.0ms, 16.0ms in check_app_updates(), 3585.0ms in other" Which means my user code is running proper fast, still the utility loop is too heavy. My "specialty" is that I use a User-Defined Namespace. (In fact I love that persistent variable feature, so that a complex app can restore its normal operation seamlessly after a change/reload.) Checking the source I found that the utility loop stores the (writeback: hybrid) namespaces, so I set writeback to performance, and it seems the utility loop warnings are gone. Yey! Maybe some IO overhead slows down saving stochastically, giving the annoying warning often. I have only 6 entities whose state changes only a few times a day, and an RPI4+SDcard standard, so it shouldn't be a problem, but still, it seems writeback hybrid is problematic in its current state on this system. To restore data safety (mimic hybrid), I set up a workaround: a separate few line app executing self.save_namespace( namespace="... in every few minutes.

At the end I have 2 suggestions: 1) Improve the utility loop warning to display time spent in the namespace saving separately. (few lines of extra code only) 2) Improve namespace saving to check in CPU if anything has changed since the last storage operation, and access disk only if it is necessary. Since the utility loop is usually 1sec, entities within this timeframe rarely change, so most of the diskIO could be omitted.

Do you think I should report these feature request?

ReneTode commented 3 years ago

glad you found out that it is I/O up untill now if people had I/O problems that resulted in extra time spend in check_app_updates(), because that uses I/O all the time. thats why i didnt think of that. if you think its important that some features are added, you can always ask, and indeed its better to make a new issue for that. i dont know how easy it would be to add namespace, and how worthwhile it is, thats not up to me anymore to decide. for most people it wouldnt bring extra info, but it would bring extra logging (and questions for people that are unfamiliar with namespaces)

improvement for namespace saving is a valid point in my eyes. its a recent addition from AD, so it probably could be made better.

@Mithras

@katonage brought a good point. HA already has a non-dev friendly mechanism to write automations. Why is AD targeting the same audience?

AD is certainly not targetting the same audience. HA uses YAML and AD uses python. and if you think that AD is non-dev friendly please go create a dev-friendly environment. or check out pyscript, its async as you like https://community.home-assistant.io/t/pyscript-new-integration-for-easy-and-powerful-python-scripting/215967

but please stop trying to tell that AD should be something else then it is. because thousands of people are very happy with AD as it is!!

Mithras commented 3 years ago

but please stop trying to tell that AD should be something else then it is. because thousands of people are very happy with AD as it is!!

Millions of people were happy riding horses not too long ago. AD already supports async well enough for me. If you think it's better to delay going async until you are forced by the industry, do just that.

wsw70 commented 3 years ago

OK I think we're done here with the technical aspects