CliMA / slurm-buildkite

Run buildkite jobs on a slurm cluster
Other
8 stars 1 forks source link

Shared initialization step? #8

Open simonbyrne opened 4 years ago

simonbyrne commented 4 years ago

In our old slurmci we have a separate init job that installs the packages and does a round of precompilation. This isn't such an issue with buildkite as agents have a cache that they can use (reducing the need to reinstall packages), but if we wanted to do something like this we would need a way to share the initialized cache between the downstream agents.

simonbyrne commented 4 years ago

One idea from JuliaCon BoF on Julia in Production: build a system image of all the dependencies, and invalidate only if the Manifest changes

simonbyrne commented 4 years ago

We could even do this into a Singularity container?

jakebolewski commented 4 years ago

I think that would work as long as nothing had to be written to the singularity container during runtime (after build step)

simonbyrne commented 4 years ago

I think singularity containers are immutable

simonbyrne commented 4 years ago

How about this:

jakebolewski commented 4 years ago

If we could do this with a singleton slurm job instead of as extra logic in the agent I think that would be preferable as it is a bit more flexible. Is this possible with slurm?

simonbyrne commented 4 years ago

on second thoughts, I agree an explicit extra step would be better.

simonbyrne commented 4 years ago

Is this possible with slurm?

Yes, you give it a unique job name (say --job-name=buildkite-init), and then use --dependency=singleton, and it will only run one instance at a time.

jakebolewski commented 4 years ago

Ok I think this should be straightforward then

simonbyrne commented 1 year ago

This might be a nice solution: https://github.com/JuliaCI/DepotCompactor.jl/