Open mikemorency opened 2 weeks ago
Attention: Patch coverage is 52.10084%
with 57 lines
in your changes missing coverage. Please review.
Project coverage is 28.25%. Comparing base (
013afa4
) to head (81f441c
). Report is 24 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@mariolenz Im still working on this (albeit locally) and I think Im at a bit of a stopping point. There are two options for caching in my mind, although neither are great IMO:
cloud.common
. This is a relatively complex caching solution that stores the module file and python imports on the remote host so later tasks can re-use them. Testing with the guest_info
module, I saw a 2 second decrease in the module runtime after the first task. We can also cache function results, similar to option 2.shelf
to cache method results. The cache would persist for X seconds and subsequent calls to a cached function would be much. much faster.For option 1, the added complexity is considerable and errors can be downright cryptic. vmware_rest
uses the turbo server so we do already depend on it but there are a few open issues (including one asking to disable the functionality). IME 2 seconds sounds nice but in the context of a module like guest_info that takes 30+ seconds to run, its not really all that great. Overall I am weary about the errors and complexity of the solution.
For option 2, the main downside is how limited we would be in what can be cached. For a cache result to persist from task to task, the result needs to be encoded and decoded. Pyvmomi objects are complex and it seems like many cannot be decoded back into objects. I envision the main use case would be to cache VM IDs, and then modules can quickly lookup the VM info using the ID. We may be able to cache VM object results. That would be much more useful but I haven't tried it yet.
I know thats a lot, hopefully its clear-ish. I just wanted to update and give an opportunity to raise any questions.
For option 2, the main downside is how limited we would be in what can be cached. For a cache result to persist from task to task, the result needs to be encoded and decoded.
I don't know enough about how ansible-core executes a playbook. Is there a new process for every task? Or is it just one process running all the tasks? Or, if you run for example 5 tasks in parallel, are there 5 processes ("workers") that run the tasks?
If several tasks are run in a process, how about just caching the session in an in-memory data structure like a dict? With (host, user, pw)
as the key and session id
as the value?
BTW thanks a lot for working on this!
Each task is a separate process, which is why option 2 is so disappointing :/ The turbo server (option 1) works by keeping a persistent socket open on the remote host and running tasks through that (or at least, thats my understanding of it at this point). Since the session is persistent, we can use an in-memory cache like you describe which is nice.
My most recent pushes have been experimenting using the functools
cache decorator with option 1. I think theres a lot of potential there. Caching a function result is easy, theres hardly any custom code required to support it, and the cache functionality is intuitive. By caching some get
methods, I lowered the repeated guest_info
runtime from 30s to 1s. Of course, theres situations where a "stale" cache can cause issues but I have options to clear the cache both via python code and ansible task so it should be manageable.
I think that adding the turbo server as an optional/experimental feature is a good idea. That would enable users to save the 2 seconds per task and re-use their authenticated sessions. Plus better error handling/docs for the turbo server would help with vmware_rest anyway. We can add in function caching where it makes sense and is relatively safe, but thats probably better as a "phase 2" thing. If that makes sense to you (and Danielle, ill talk with her this week), I'll close this and open a new PR with better documentation
For the record, someone I've talked about this told me:
there have been a number of partially-failed attempts in this area that have actually shipped as part of core over the years- most have been removed (accelerate, turbo). It's not hard at all to get a persistent reusable module state process to hang around- the hard part is getting it to play nicely/safely. Common problems with previous attempts:
- process lifetime management (control-node-only is much easier, hard-tied to control process lifetime, preventing process clog, multiple overlapping jobs, different concurrent versions of core/content)
- robust pool key selection (eg, host/port/credential/?- making sure you don't pick up the wrong one)
- secure access to the cached process (esp where
become
is a factor)
Sounds complicated. Especially since we need inter-process communication.
Yea it definitely is complicated. The turbo server in cloud.common is only used by maybe 3 projects? And only vmware_rest has it as mandatory or even enabled by default as far as I know.
All three of those "problems" are still relevant for the turbo server, although I think the first one is less so. On the one hand theres a lot of situations where this is could cause problems. On the other hand, all of the problems can be avoided if the user knows what to expect. Its a tough call. If vmware_rest wasnt already dependent on it, I would vote to pass
SUMMARY
Testing turbo server to see if tests are executed faster. If it works it should cache sessions in between api calls to vcenter. Based on https://github.com/ansible-collections/kubernetes.core/blob/c8a9326306e65c0edf945fb3e99a67937cbe9375/plugins/modules/k8s_cp.py#L143
Related to https://forum.ansible.com/t/10551