Closed electrofelix closed 1 year ago
@ffrank Is there anything else needed for this?
@ffrank is there anything further needed for this?
Hi @electrofelix thanks for the patch. I'll try and smoke test this asap so it can get merged.
It should probably get an automated test as well.
Sorry for the huge delay!
Finally took a good look at this. I believe the setting is correct. Before merging, I need to
This API looks scary. I tried with Puppet::Face[:catalog, "0.0"].set_terminus :rest
(which should be the equivalent to passing --terminus
to a call to puppet catalog find
), and that seems to do the trick as well, without going so deep into internals.
@electrofelix I'm a little puzzled by the cache_class
bit. This one does not seem to be exposed in quite the same way. Can you elaborate on what it does and why it's needed?
@ffrank mostly I was just looking at the catalog download code at https://github.com/puppetlabs/puppet/blob/61e123e37b136a4f1606f0bd6dce669611496272/lib/puppet/face/catalog.rb#L134-L139 and based on seeing what it used (as it was working locally), made a giant assumption that replicating it should result in working in my environment as well without really digging to deep into why it worked.
Looking closer at https://github.com/puppetlabs/puppet/blob/master/lib/puppet/indirector/indirection.rb#L46 I'm thinking the main reason the catalog download explicitly sets it not to cache, is that you don't want the local cached version from the last run updated by something that didn't actually apply the catalog. It's not clear if you don't set it what actually happens.
https://github.com/puppetlabs/puppet/commit/dc0088f96f80292f545395eb3084dd37f7883ee9 is the change that added it to the catalog download command but unfortunately I'm not seeing any explanation of why that call is needed.
Just reading around the code (without actually testing anything), I'd guess that during the call to find
, the line at https://github.com/puppetlabs/puppet/blob/85d18afe4f0b1ec64000175632eb27f989f68013/lib/puppet/indirector/indirection.rb#L229 results in https://github.com/puppetlabs/puppet/blob/85d18afe4f0b1ec64000175632eb27f989f68013/lib/puppet/indirector/rest.rb#L185-L198 being called, and I'm not seeing that being particularly useful in the case of downloading the catalog?
So after some more experiments, it does feel as if the master is indeed not contacted while a valid catalog is still in the cache.
Thanks for the pointers, will try to figure out what the cleanest way forward is. Pretty sure you were on the right track!
@ffrank ah, I missed the line https://github.com/puppetlabs/puppet/blob/85d18afe4f0b1ec64000175632eb27f989f68013/lib/puppet/indirector/indirection.rb#L215 does a cache look up first.
Using the local cache to lookup the catalog is probably a handy optimization when running on nodes that are running puppet regularly. However I think would want to ensure saving to cache is disabled at https://github.com/puppetlabs/puppet/blob/85d18afe4f0b1ec64000175632eb27f989f68013/lib/puppet/indirector/indirection.rb#L226 as should the cache be expired and the request went to the master, not sure it's a good idea to save the result locally.
Yes I think my previous statement was ill informed and plain wrong.
Did some spelunking this morning, found this gem in puppet.rb
:
def self.run_mode
# This sucks (the existence of this method); there are a lot of places in our code that branch based the value of
# "run mode", but there used to be some really confusing code paths that made it almost impossible to determine
# when during the lifecycle of a puppet application run the value would be set properly. A lot of the lifecycle
# stuff has been cleaned up now, but it still seems frightening that we rely so heavily on this value.
#
# I'd like to see about getting rid of the concept of "run_mode" entirely, but there are just too many places in
# the code that call this method at the moment... so I've settled for isolating it inside of the Settings class
# (rather than using a global variable, as we did previously...). Would be good to revisit this at some point.
#
# --cprice 2012-03-16
The "run mode" informs things like default termini. Will look into ways to maybe make "agent" our default run mode if possible :/
So the facts
face does the following in one of its actions:
when_invoked do |options|
# Use `agent` sections settings for certificates, Puppet Server URL,
# etc. instead of `user` section settings.
Puppet.settings.preferred_run_mode = :agent
I've tried it, but the info
action on the catalog
face keeps indicating the compiler
terminus regardless.
I suppose this isn't a viable way forward after all then.
I will run some more tests. Current theory is, the catalog download
action just wants to make sure that the catalog will indeed be downloaded from the master, and not taken from the cache (it is supposed to write to the cache after all).
If this is true, then we should not do this, because we would like to rely on the cache according to whatever is the configuration wrt. that (i.e., let Puppet and/or the user decide).
Goofed around with Puppet 5.5 a little.
From what I can tell, all the caching stuff wrt the catalog is largely non-functional in our use case.
The puppet agent
proper will write a cached catalog and respects the --use_cached_catalog
option.
puppet catalog find
always compiles, unless --terminus
is used. In which case it ignores the cache, and the --use_cached_catalog
setting.
Not sure about the code you found in the catalog download
action, but I'm inclined to disregard the caching bit for the time being.
One thing I would like you to add: This currently breaks the ability to use puppet mgmtgraph ... --manifest
and puppet mgmtgraph ... --code
, which we use from mgmt
. I've successfully smoke-tested the following: https://github.com/ffrank/puppet-mgmtgraph/tree/use-rest
If you could include something like this here (and drop the cache thingy), I feel that this is merge-able. I will try and come up with some tests in the meantime.
This is a very essential fix, thanks for your research and engagement! It will go a very long way towards asking for support statistics from Puppet users out there.
(written before your last comment - but slow in posting so might be invalid)
I guess I was just aiming for the most conservative result assuming that changing the local cache should only be done if absolutely sure it was correct.
So this should read from the cache unless the user wants the latest and to ignore the cache (another CLI option?), but should we write to the cache after retrieval? What's the impact of that?
One thing I would like you to add: This currently breaks the ability to use
puppet mgmtgraph ... --manifest
andpuppet mgmtgraph ... --code
, which we use frommgmt
. I've successfully smoke-tested the following: https://github.com/ffrank/puppet-mgmtgraph/tree/use-rest
That's not good, I guess I wasn't thinking of this code getting called to have it's output be used by another tool. That explains how important it is to have the caching work correctly as if you have a few of these calls in a run, you don't want to by-pass the cache on each execution.
2 questions to help clarify my understanding.
On the limiting this to only if --code
and --manifest
are not set, I can't see how this would work within my local puppet env if I was to try and run mgmt and have it call out to puppet mgmtgraph
. Or did your above comments mean that at the start of the run the catalog will have been retrieved to the local cache by how puppet calls mgmt, which is why it doesn't need to use the rest
terminus?
Next question: Is it always valid to have the cache written to? I'm wondering if that is correct when running the command interactively? If not, maybe need to consider passing an option to the find
call to disable writing to the cache, otherwise the thoughts/questions below are irrelevant.
Looking at https://github.com/puppetlabs/puppet/blob/85d18afe4f0b1ec64000175632eb27f989f68013/lib/puppet/indirector/indirection.rb#L211-L212 should the check on whether --code
or --manifest
are unset create an option hash containing ignore_cache_save
set to true, that is passed to the find
face along with Puppet[:certname]
in order to be able to disable save in cache at https://github.com/puppetlabs/puppet/blob/master/lib/puppet/indirector/indirection.rb#L226?
Basically I'm thinking this would mean that running with --code
or --manifest
would cache the retrieved catalog, while running directly to understand the graph generated would only use the cache for reading and ever update it even if it had expired?
Thanks for considering the edge cases. I created some confusion here as well.
First, for your question: mgmt will use one of three options depending on the invocation:
mgmt run puppet /path/to/manifest.pp
will internally launch puppet mgmtgraph print --manifest /path/to/manifest.pp
.mgmt run puppet '<puppet code inline>'
will internally launch puppet mgmtgraph print --code
mgmt run puppet agent
will just launch puppet mgmtgraph print
with no argumentsThe first two run a standalone compiler (like a master, see puppet apply
) and I'm fairly certain the resulting catalog will not ever be written to any cache.
The third was meant to connect to the master and receive the catalog. You have discovered that this is broken. Your patch fixes this. It has been my observation that puppet catalog find
(or the Ruby API call from the code we're looking at) will never write (or read) the catalog either. I have not proven this 100%, but I'm not extremely worried. Even if mgmt run puppet agent
were to overwrite an existing cache, it would do it with a catalog that is indistinguishable from what puppet agent
would have received and written. (Unless I'm missing odd edge cases, that is.)
Does this address your concerns?
Hi @electrofelix hope all is well!
Do you suppose we can continue with this some time in March?
You know what? This is the worst March ever :(
That it is, hopefully get back to this soon
Ohai I'm at a conference. Finally back to hacking a little bit 😇
I have created #25 with fix-up work identified in earlier comments here.
Therefor, I'm finally merging this finally, code additions pending.
Thank you so so so much for this contribution!
Match the
puppet catalog download
command by setting the terminus to 'rest' and disabling caching of the result.The
puppet catalog find
command can fail to retrieve the catalog from the puppet master unless using--terminus rest
, while thepuppet catalog download
command works consistently.Defaulting to match the download behaviour that in turn calls find on the backend seems like it will work for more environments.
Fixes #16