test turbo server - Githubissues

mikemorency commented 2 weeks ago

SUMMARY

Testing turbo server to see if tests are executed faster. If it works it should cache sessions in between api calls to vcenter. Based on https://github.com/ansible-collections/kubernetes.core/blob/c8a9326306e65c0edf945fb3e99a67937cbe9375/plugins/modules/k8s_cp.py#L143

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 52.10084% with 57 lines in your changes missing coverage. Please review.

Project coverage is 28.25%. Comparing base (013afa4) to head (81f441c). Report is 24 commits behind head on main.

Files with missing lines	Patch %	Lines
plugins/module_utils/_vmware_ansible_module.py	37.25%	31 Missing and 1 partial :warning:
plugins/module_utils/_vmware_facts.py	40.00%	9 Missing :warning:
plugins/module_utils/_vmware.py	53.84%	6 Missing :warning:
plugins/module_utils/_vmware_rest_client.py	55.55%	4 Missing :warning:
plugins/module_utils/_vmware_tasks.py	33.33%	2 Missing :warning:
plugins/modules/clear_cache.py	83.33%	2 Missing :warning:
plugins/modules/folder_template_from_vm.py	33.33%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #78 +/- ## ========================================== + Coverage 26.09% 28.25% +2.15% ========================================== Files 19 24 +5 Lines 1667 1961 +294 Branches 331 381 +50 ========================================== + Hits 435 554 +119 - Misses 1232 1406 +174 - Partials 0 1 +1 ``` | [Flag](https://app.codecov.io/gh/ansible-collections/vmware.vmware/pull/78/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ansible-collections) | Coverage Δ | | |---|---|---| | [sanity](https://app.codecov.io/gh/ansible-collections/vmware.vmware/pull/78/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ansible-collections) | `28.25% <52.10%> (+2.15%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ansible-collections#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mariolenz commented 2 weeks ago

mikemorency commented 1 week ago

@mariolenz Im still working on this (albeit locally) and I think Im at a bit of a stopping point. There are two options for caching in my mind, although neither are great IMO:

Use the "turbo server" in cloud.common. This is a relatively complex caching solution that stores the module file and python imports on the remote host so later tasks can re-use them. Testing with the guest_info module, I saw a 2 second decrease in the module runtime after the first task. We can also cache function results, similar to option 2.
Use something like shelf to cache method results. The cache would persist for X seconds and subsequent calls to a cached function would be much. much faster.

For option 1, the added complexity is considerable and errors can be downright cryptic. vmware_rest uses the turbo server so we do already depend on it but there are a few open issues (including one asking to disable the functionality). IME 2 seconds sounds nice but in the context of a module like guest_info that takes 30+ seconds to run, its not really all that great. Overall I am weary about the errors and complexity of the solution.

For option 2, the main downside is how limited we would be in what can be cached. For a cache result to persist from task to task, the result needs to be encoded and decoded. Pyvmomi objects are complex and it seems like many cannot be decoded back into objects. I envision the main use case would be to cache VM IDs, and then modules can quickly lookup the VM info using the ID. We may be able to cache VM object results. That would be much more useful but I haven't tried it yet.

I know thats a lot, hopefully its clear-ish. I just wanted to update and give an opportunity to raise any questions.

mariolenz commented 1 week ago

For option 2, the main downside is how limited we would be in what can be cached. For a cache result to persist from task to task, the result needs to be encoded and decoded.

I don't know enough about how ansible-core executes a playbook. Is there a new process for every task? Or is it just one process running all the tasks? Or, if you run for example 5 tasks in parallel, are there 5 processes ("workers") that run the tasks?

If several tasks are run in a process, how about just caching the session in an in-memory data structure like a dict? With (host, user, pw) as the key and session id as the value?

BTW thanks a lot for working on this!

mikemorency commented 1 week ago

Each task is a separate process, which is why option 2 is so disappointing :/ The turbo server (option 1) works by keeping a persistent socket open on the remote host and running tasks through that (or at least, thats my understanding of it at this point). Since the session is persistent, we can use an in-memory cache like you describe which is nice.

My most recent pushes have been experimenting using the functools cache decorator with option 1. I think theres a lot of potential there. Caching a function result is easy, theres hardly any custom code required to support it, and the cache functionality is intuitive. By caching some get methods, I lowered the repeated guest_info runtime from 30s to 1s. Of course, theres situations where a "stale" cache can cause issues but I have options to clear the cache both via python code and ansible task so it should be manageable.

I think that adding the turbo server as an optional/experimental feature is a good idea. That would enable users to save the 2 seconds per task and re-use their authenticated sessions. Plus better error handling/docs for the turbo server would help with vmware_rest anyway. We can add in function caching where it makes sense and is relatively safe, but thats probably better as a "phase 2" thing. If that makes sense to you (and Danielle, ill talk with her this week), I'll close this and open a new PR with better documentation

mariolenz commented 1 week ago

For the record, someone I've talked about this told me:

there have been a number of partially-failed attempts in this area that have actually shipped as part of core over the years- most have been removed (accelerate, turbo). It's not hard at all to get a persistent reusable module state process to hang around- the hard part is getting it to play nicely/safely. Common problems with previous attempts:

process lifetime management (control-node-only is much easier, hard-tied to control process lifetime, preventing process clog, multiple overlapping jobs, different concurrent versions of core/content)

robust pool key selection (eg, host/port/credential/?- making sure you don't pick up the wrong one)

secure access to the cached process (esp where become is a factor)

Sounds complicated. Especially since we need inter-process communication.

mikemorency commented 1 week ago

Yea it definitely is complicated. The turbo server in cloud.common is only used by maybe 3 projects? And only vmware_rest has it as mandatory or even enabled by default as far as I know.

All three of those "problems" are still relevant for the turbo server, although I think the first one is less so. On the one hand theres a lot of situations where this is could cause problems. On the other hand, all of the problems can be avoided if the user knows what to expect. Its a tough call. If vmware_rest wasnt already dependent on it, I would vote to pass

ansible-collections / vmware.vmware

test turbo server #78

SUMMARY

Codecov Report