apache / infrastructure-actions

Apache infrastructure
https://infrastructure.apache.org/
1 stars 3 forks source link

Use artifacts to cache the build #51

Open sebbASF opened 5 months ago

sebbASF commented 5 months ago

The PR looks for a GFM build artifact in the repo housing the action. If one is found, it is downloaded and the build is skipped.

It can take 15+ seconds to build GFM, which is large proportion of the run-time, so it seems worth doing.

[A previous version of this PR also checked the calling repo for the artifact, and would create an artifact if none was found there either. However this would potentially create artifacts in lots of repos, as well as being an unexpected side effect of the build action]

sebbASF commented 5 months ago

Also need a job to create the artifact in the action repo when necessary. This is provided by #56

gstein commented 5 months ago

Should we create a distinct action for this? eg. infrastructure-actions/build-gfm ? ... It seems there is a fair bit of logic around this, which could be pulled out into a separate action for clarity of maintenance. ... I don't have a particular opinion either way. The current action.yml isn't so large as to be confusing.

All that said, I do think that caching the artifact is the right approach. No need to keep rebuilding it for $N sites and runs.

gstein commented 5 months ago

Oh. I just saw your reference to #56 ... but my query is: why duplicate build logic between that and this one? And #56 has a very different looking code for a build (it has a lot of cron/expiration checks) ... could we roll up all logic around building/stashing the artifact into a new Action. ?

sebbASF commented 5 months ago

AFAICT, actions don't have a way to exit cleanly early. One would need to set a flag and check it in every subsequent step of the action. [Or use a hack and exit with an error, but catch it at the end and force a successful exit. Ugh!]

Also an action can only be called as a separate step; it cannot be invoked as part of a step that has a run clause.

So I cannot see any way to extract an action that would simplify the code.

However, I think the build sections could be simplified slightly by moving the pushd $WORKDIR/popd statements into build-cmark.sh

Note also that the build-artifacts job is only intended for use on the actions repo, so I'm not sure it helps to extract some of the logic into a separate action.

gstein commented 5 months ago

build-cmark.sh is fair game, as it is only used within this repository/action. Use your best judgement. Thanks!

gstein commented 5 months ago

Offhand, let the Action define the LIBCMARKDIR, and hand that to build-cmark.sh. Is that your thinking?

sebbASF commented 5 months ago

Offhand, let the Action define the LIBCMARKDIR, and hand that to build-cmark.sh. Is that your thinking?

That would require other changes to build-cmark.sh, as it would potentially have to move the lib directory to the input location. So I did not do that.

However it now occurs to me that it would simplify the Docker build if it could specify LIBCMARKDIR without having to know the directory structure of the tar file. Only the build file should know that.

I might change it again...

gstein commented 5 months ago

At a minimum, build-cmark.sh could simply mv cmark.so $LIBDIR_GIVEN_TO_US

sebbASF commented 5 months ago

At a minimum, build-cmark.sh could simply mv cmark.so $LIBDIR_GIVEN_TO_US

I think it may need the whole of the lib directory, but that is no harder to move.

sebbASF commented 4 months ago

AFAICT the apt version of GFM is designed as a stand-alone executable. There is an associated library, but it is not in the layout expected by the ASF gfm plugin. Either the layout would have to be emulated by the action, or the plugin needs work to use the executable (which might be slower?)

I think this is job for later once the action is known to be working OK.

It also appears to be a bit slower than fetching an artifact.

(Later) I tried installing it and setting up the expected links. However the gfm plugin fails with: libcmark-gfmextensions.so: undefined symbol: core_extensions_ensure_registered I doubt it is worth trying to fix that.

sebbASF commented 4 months ago

One more consideration:

Using actions/cache doesn't help with the first build in a new repo (not sure if it helps with the first build when a new branch is cloned), whereas the cached artifacts are always available to new branches and new repos.

assignUser commented 4 months ago

Also, changing from BuildBot to GH CI is a big change, and the fewer other changes that are made, the easier it is to debug problems. Updates to GFM and Pelican versions can be made later.

:+1: makes sense, was just wondering :)

whereas the cached artifacts are always available to new branches and new repos.

True but they also add a considerable amount of code/process to maintain and with the build only taking 15 seconds I'd lean towards 'no code is the best code' ^^ If the build was more substantial, this would be a great setup, but for 15 seconds it seems a bit overkill. An alternative would be to use a docker image in the step to run pelican, or even a complete docker action but I don't really have any experience with Pelican so :shrug: ghcr.io caching + a minimized image makes that pretty fast too.

not sure if it helps with the first build when a new branch is cloned

If there is a cache on the base branch then yes.

sebbASF commented 4 months ago

I tried using a Docker image and that was slower.

assignUser commented 4 months ago

I tried using a Docker image and that was slower.

To clarify before I go down a rabbit hole to test this: you used the docker file and build the image in the workflow or pre-built a docker image and hosted it on ghcr.io and used that in the workflow?

sebbASF commented 4 months ago

I think I used ghcr.io. This took quite a while to load.