All download or pkg recipes should have a StopProcessingIf processor

autopkg / recipes

Recipes for AutoPkg

230 stars 202 forks source link

All download or pkg recipes should have a StopProcessingIf processor #286

Closed grahampugh closed 5 years ago

grahampugh commented 5 years ago

Recipes without a StopProcessingIf processor carry on through all processes, regardless of whether the source/downloaded installer has changed. For some reason this doesn't seem to be a problem with .munki recipes, otherwise I suspect this would have been dealt with long ago. As I am not a user of Munki, I haven't dug into why this is not a problem.

At present, I have to make my own .jss recipes, and place a StopProcessingIf process at the start of it. But this is far from ideal, because it means the .pkg recipe always runs through - a waste of time and resources when the download is unchanged.

For cross-compatibility with different management systems, either .download or .pkg recipes should by default have a StopProcessingIf process as soon as possible after the URLDownloader process.

I realise that many .munki recipes do not require a .pkg recipe, since Munki can handle application bundles directly. And since .munki doesn't seem to need the StopProcessingIf process to prevent unnecessary reimports, perhaps a pragmatic approach would be that the StopProcessingIf process was added as the first process in every .pkg recipe.

gregneagle commented 5 years ago

TL;DR: we're not going to make these changes to the core recipes.

This has been proposed before (I think I even proposed it at one point; after all, I'm the one who created the StopProcessingIf processor), but upon further thought/discussion, we realized it would cause its own set of problems.

Consider this: today I run a Foo.munki recipe. There's a new download available, and it successfully downloads the item, creates a package, but the import into Munki fails because the Munki repo is unavailable for some reason. I notice the issue and fix it. Tomorrow the recipe runs again, and though it does not download anything new, it continues on and imports the item into the Munki repo.

If instead it stopped processing because there wasn't a new download, it would also never import the item into the Munki repo. Since that wouldn't be an error, it's likely I'd never notice the issue.

There are several variations on this, but it boils down to the fact that just because there isn't a new download or we don't need to create a new package, there is no guarantee that the item is actually in the Munki repo until the recipe runs and checks. So it has to run all the way through.

You note that this doesn't seem to be a problem for Munki recipes: I'm guessing that this is because the MunkiImporter processor can tell if an item is already in the Munki repo before importing it. If the JSSImporter can't determine that an item has already been imported into the JSS, that's arguably a defect of the JSSImporter.

Most/many recipes have three main stages: download, package, import. A large number of Munki recipes are only download and import because Munki knows how to install from"drag-n-drop" disk images. The URLDownload processor used by most download recipes tries to not redownload something it already has, using ETags and file sizes and modification dates to determine if the remote file has changed. For Munki recipes, we've already determined that the MunkiImporter processor tries not to reimport an item that has already been imported.

That leaves packaging. Usually items get packaged on every run of a recipe, regardless if the source has changed or not. That could probably be improved. When I last looked at it, it seemed a lot of work to do "correctly", and I had other more important things to spend time on. I wasn't as concerned about the "waste of time and resources" here.

I would propose that if this is of concern to you, you need to focus on improving the efficiency of packaging and of importing into JSS. Obviously you can do what you want in your own recipes and add StopProcessingIf processors if you'd like, but I don't support adding them to download recipes as a matter of course/habit.

grahampugh commented 5 years ago

If the JSSImporter can't determine that an item has already been imported into the JSS, that's arguably a defect of the JSSImporter.

Actually, that's not so much a defect of the JSSImporter processor, as a defect of Jamf Pro. The package object in the Jamf Pro API has no "version" string or hash to refer to. So a meaningful comparison between the package on the Jamf Pro Distribution Point would involve downloading the package from the repository. Alternatively, I suppose JSSImporter could maintain its own pkginfo catalog similar to Munki's method - I might have to look into that.

On the point of the "waste of time and resources" of the .pkg recipes running every time, there is here an issue of scalability for users of all management platforms. With a large recipe list, I've seen examples discussed where administrators are having to be inventive and split their recipe lists to run on different days because the entire run takes too long. And since CodeSignatureVerification steps were added to many recipes, run times of .download recipes have also lengthened, as deep verification can take a considerable time.

The length of a run of a recipe list could potentially be drastically shortened if code verification large packages were not being processed on every run when the package is unchanged. We see examples such as MATLAB, MacTeX and (until it broke) the Adobe CCP apps taking very long times to package up. These recipes could simply not be part of a normal recipe run if we didn't put StopProcessingIf processors in those download recipes.

A possible solution that would help all Jamf/AutoPkg administrators, as well as those with long recipe lists, without them having to maintain forks of download recipes, would be to add the StopProcessingIf processor to download recipes, but with no predicate (assuming the processor will continue every time if there is no predicate?). The predicate value is then available to override. People using .jss recipes and those with large recipe lists could then override the predicate value to achieve what they require, while those of you unconcerned with the longer run times would not have to change anything.

gregneagle commented 5 years ago

So try that out (StopProcessingIf processor with overridable predicate) on some of your own recipes and see how it goes/what you learn.

And I still encourage you to look into improving the efficiency of some of the other things you mention, because my original point still stands: just because you don't have a new download doesn't mean the item has been properly imported into your (Jamf) repo. Stopping a recipe run because the download is unchanged is not (generically) the right approach.

grahampugh commented 5 years ago

Hi Greg,

I take your point about the JSSImporter processor, and will talk with the current developers about whether we could include some sort of local pkginfo data.

However, despite this, due to the added efficiencies I have listed, I still see the value in adding the possibility of stopping the process after the URLDownloader processor has finished. After some testing, I now see a viable workflow for recipes that I would like to propose.

Download recipes could have a StopProcessingIf processor after the URLDownloader processor and before the CodeSignatureVerifier processor, with an input key determining the value of predicate. The default value would be FALSEPREDICATE, which always evaluates to False, so the processing continues (https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Predicates/Articles/pSyntax.html).

Then, the value of predicate can be overridden in the JSS recipe (or in an override) by adding a different value to the STOPPROCESSINGIF_PREDICATE Input key, such as download_changed == False.

The default value of FALSEPREDICATE could be coded into the StopProcessingIf processor so that no default value had to be supplied in the Download recipe.

To function, the following would need to be added to Download recipes:

    <key>Process</key>
    <array>
                 ...
        <dict>
            <key>Processor</key>
            <string>StopProcessingIf</string>
        </dict>
                ...
    </array>

or, if the StopProcessingIf processor was not amended to include a default of FALSEPREDICATE, we would need:

    <key>Input</key>
    <dict>
        <key>STOPPROCESSINGIF_PREDICATE</key>
        <string>FALSEPREDICATE</string>
                ...
    </dict>
...
    <key>Process</key>
    <array>
                 ...
        <dict>
            <key>Arguments</key>
            <dict>
                <key>predicate</key>
                <string>%STOPPROCESSINGIF_PREDICATE%</string>
            </dict>
            <key>Processor</key>
            <string>StopProcessingIf</string>
        </dict>
                ...
    </array>

Then, an override or JSS recipe could have:

    <key>Input</key>
    <dict>
        <key>STOPPROCESSINGIF_PREDICATE</key>
        <string>download_changed == False</string>
                ...
    </dict>

This would mean the process list is stopped at the correct and most efficient point, rather than at the start of the JSS recipe. If the recipe failed, you could always run the recipe again with a -k STOPPROCESSINGIF_PREDICATE='FALSEPREDICATE' to ensure it ran through all processes regardless of the output of the URLDownloader processor.

Even better: since we can use a default of FALSEPREDICATE, the StopProcessingIf process could even be built into the URLDownloader or EndOfCheckPhase processors (or indeed many other processors as an optional input) so that Download recipes did not have to be altered at all.

gregneagle commented 5 years ago

I still think and advocate that a better approach would be to modify the CodeSignatureVerification processor to short-circuit if the download is not new, to modify the PkgCreator processor to short-circuit if the input is not new, and to modify the JSSImporter to short-circuit if its input is not new.

This would, of course, be more work, but all existing recipes would benefit without needing to convince the recipe authors to insert new keys.

grahampugh commented 5 years ago

Agreed that there are ways to avoid having to amend lots of recipes to achieve this.

For anyone else who ends up here, I've made a PR to the AutoPkg repo, to consider altering the StopProcessingIf and URLDownloader processor to achieve the desired goal of this issue.

timsutton commented 5 years ago

Disclaimer: I don't use AutoPkg at my current workplace so it has been a while for me. But, I do remember that the issue of how to prevent some of the "intermediate" tasks in typical recipes from running was something that came up for people from time to time.

I'd say firstly that wherever possible, processors which are widely-used and require a lot of time to do their work should be able to avoid repeating the lengthy work whenever it is possible for them to detect they don't need to.

For JSSImporter, agreed that while Jamf doesn't make it too easy, there should be a way to identify if an item doesn't need to be re-imported. If it's not possible to do that reliably enough solely from the filename (having a single "package item" containing a name and a package version in one string always struck me as inadequate), then is there a way to attach arbitrary text, tags or other metadata to a package in Jamf?

CodeSignatureVerifier could potentially store some kind of a "done" file containing a checksum of the input so that the system can trust that it has already verified a particular input. Similar to how ETag and Last-Modified headers are stored in local files as extended attributes (and as in Munki).

PkgCreator already has its own naive "short-circuiting" by just comparing the identifier and the version, but my recollection from past use is that often the time "wasted" is the time spent doing any initial copying or decompression, or in any manner setting up the package's payload before PkgCreator is finally called. So if a recipe is using PkgCreator, it's not that it can't figure out that it's already got an App-1.2.3.pkg file that matches the version it extracted somehow from its last download (or website, or however), it's that when it's at the "start copying stuff and setting up the payload" steps it doesn't have the mechanism to "look ahead" and see that you're going to ask to make a package with that and that you've already got that cached locally.

If you are talking about large recipe lists that involve also needing to build really large packages, I can share the very simple but effective tactic I used often, which was to simply run with --check and then only take action on recipes which had new downloads. I didn't automate that, but here is an example of something using basic tools in a Jenkins-based setup I used to maintain for some time: https://github.com/timsutton/autopkg-ci/blob/master/steps/autopkg_run.py#L90-L94

^ this boils down to using --report-plist so that I could ingest the plist and then do something based on the results. If I had automated something for only running recipes which had changed downloads, I'd have taken that same approach but parsed the results from the successful downloads. Today I ran this:

$ autopkg run GoogleChrome.pkg -c --report-plist foo.plist

And that produced this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>failures</key>
    <array/>
    <key>summary_results</key>
    <dict>
        <key>url_downloader_summary_result</key>
        <dict>
            <key>data_rows</key>
            <array>
                <dict>
                    <key>download_path</key>
                    <string>/Users/timsutton/Library/AutoPkg/Cache/com.github.autopkg.pkg.googlechrome/downloads/GoogleChrome.dmg</string>
                </dict>
            </array>
            <key>header</key>
            <array>
                <string>download_path</string>
            </array>
            <key>summary_text</key>
            <string>The following new items were downloaded:</string>
        </dict>
    </dict>
</dict>
</plist>

While that's not the best format in the world and it doesn't contain an explicit key for the recipe identifier that generated that line item, that could be easily added, and moreover I still have all the info I'd need in that plist to then kick off a run of only the recipes that had the changed downloads. It looks to me that something like this might be able to solve your particular issue in a way that lets you tailor the behaviour to your own needs.

Finally, I think it's just interesting to note that AutoPkg has always had somewhat of a balancing act, in that it's not at its core a config management or build tool that was designed from the ground up with a caching system. It's much much "leaner" and biased towards the very specific tasks of packaging administration. As a result, it can be hard to optimize certain areas without just spending the time to make some logic smarter or to rethink some of how AutoPkg "plans" its work in advance given whatever state it may have to work with.. but that's definitely a larger undertaking.

grahampugh commented 5 years ago

Hi Tim,

Thanks for your answer. Using the check-only run isn't something I'd considered, and is interesting. One problem is that this relies on people having added the EndOfCheckPhase process to their Download recipes. I just did a quick survey and many people do have them, but some don't, some put them after the Code Verification, in one example it was before the URLDownloader processor.

I have moved on from the idea that everyone should have to edit their recipes, in favour of having the option to override whether the recipe should stop if the URLDownloader processor reports no change. I'd be interested to hear your feedback on the idea. https://github.com/autopkg/autopkg/pull/452 It's actually a very small change, but it would require a lot of recipe trust revalidations.

In the meantime, I have now implemented a small change to the JSSImporter processor which gives the option to override the process to stop if the package name is unchanged (an unfortunate restriction of Jamf Pro without a larger rework of JSSImporter to record values locally in the Cache somewhere). This addresses the main issue of re-uploading policies, smart groups etc, that users of JSSImporter had reported.

Cheers Graham