CodeGat commented 5 months ago

Background

In a similar vein to the scheduled Bitwise Reproducibility testing on ACCESS-NRI/access-om2-configs, we want to test that, on a schedule, rebuilding different versions of models from scratch produce the same build. Since we use spack, a good hallmark of a reproducible model build would be the spack.lock file.

Potential Solutions

on.schedule.cron, get the list of tags that we care about (similar to access-om2-configs with the config/released-configs.json), then get the associated GitHub Releases Assets, download the spack.lock and compare this against the spack.lock of a rebuild of the model at this version on Gadi.

Questions

Should this repository be where this type of check is done?
If we try to rebuild based on the spack.lock file, it will always produce the same build. See the bottom of https://spack.readthedocs.io/en/latest/environments.html#creating-a-managed-environment - we are guaranteed to produce the same build. If this is so, this seems like a vacuous check, then. We would be comparing spack.lock against itself if we do a spack env create my-env spack.lock && spack env activate my-env && spack install - the spack.lock will not change.
If we try to rebuild based on the spack.yaml file, it may well not produce the exact same spack.lock, ever. The concretization logic is arcane wizardry in which the things we specify in the spack.yaml will not change, but the dependenices may well change with the whims of the concretizer. It may still do the exact same thing on the tin, just with a patch version bump to some random dependency. In this case, this workflow will almost certainly always fail. In which case...
If we have a more lax definition, how do we know when something has changed enough to break build reproducibility? Patch versions to dependencies are ok, but not minor/major bumps? Something else?

Pinging @aidanheerdegen

CodeGat commented 5 months ago

Also pinging @harshula for his thoughts

aidanheerdegen commented 3 months ago

I think it is worth seeing if it is even still possible to rebuild exactly: don't need to check spack.lock is reproduced, just check if it even works to exactly rebuild from spack.lock.

Or maybe build from spack.yaml and seeing if we can recreate spack.lock? If not, why not? If hashing changes with every spack version then this is probably not worth the effort.

If we can't re-build exactly from spack.lock do we need to generate a PR against a known test config to test for bit repro? Would we then update the spack.lock if we reproduce?

There was a desire to update the software stack and test for bit repro (External Dependency CI). Is this effectively that process?

ACCESS-NRI / ACCESS-OM2

Scheduled `build reproducibility` tests #39

Background

Potential Solutions

Questions