CliMA / Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs
https://clima.github.io/OceananigansDocumentation/stable
MIT License
958 stars 191 forks source link

Use DataDeps.jl and store regression data outside of the repository? #1086

Closed ali-ramadhan closed 2 years ago

ali-ramadhan commented 3 years ago

Right now regression data takes up a significant amount of space in the repo. I suppose this is not a huge issue as only developers/contributors git clone the repo while users can just ] add Oceananigans.

But a potential solution would be to store regression data elsewhere and access it using DataDeps.jl. This might be especially good if we want more/larger regression tests. And it wouldn't increase the repo size every time you have to change the regression data.

We could maybe the store on engaging? Ideally it should be hosted somewhere with near 100% uptime as we're already maintaining Buildkite which fails sometimes, so we should try to reduce the number of possible failure points in our CI pipeline.


Copy pasted some analysis below:

I think regression files currently take up ~17.1 MiB of space in the git repo while a fresh clone of the repo is ~43 MiB (images and convergence plots probably take up several MiB).

Here's a listing of all files in git history over 300 KiB (command from https://stackoverflow.com/a/42544963):

018186272590  328KiB test/data_rayleigh_benard_regression_000001100.jld
19db949aaae8  328KiB test/data_rayleigh_benard_regression_000001000.jld
424080660c53  328KiB test/data_rayleigh_benard_regression_000001000.jld
a7e1d690d6b5  328KiB test/data_rayleigh_benard_regression_000001100.jld
72744372e5c4  361KiB test/regression_tests/data/thermal_bubble_regression.nc
4ce9699176ee  363KiB test/deep_convection_regression_10.nc
c15f95e2bf3a  364KiB test/regression_tests/data/thermal_bubble_regression.nc
6f28044e3b56  366KiB docs/src/verification/convergence_plots/gaussian_advection_diffusion_error_convergence.png
194fdf47099b  392KiB docs/src/verification/convergence_plots/gaussian_advection_diffusion_error_convergence.png
2f9d5e8650d7  420KiB docs/src/verification/convergence_plots/cosine_advection_diffusion_error_convergence.png
db8f742e7c95  446KiB docs/src/verification/convergence_plots/cosine_advection_diffusion_error_convergence.png
0de880b2b97b  468KiB docs/src/verification/plots_stratified_couette_flow_stratified_couette_flow_velocity_temperature_slices.png
d277a4e5393b  650KiB test/regression_tests/data/data_rayleigh_benard_regression.jld2
b125bc6f8e9d  709KiB test/regression_tests/data/ocean_large_eddy_simulation_VerstappenAnisotropicMinimumDissipation_10000.jld2
f5c1a7736324  709KiB test/regression_tests/data/ocean_large_eddy_simulation_VerstappenAnisotropicMinimumDissipation_10010.jld2
0b493fa7dd14  709KiB test/regression_tests/data/ocean_large_eddy_simulation_SmagorinskyLilly_10000.jld2
ad020f12370b  709KiB test/regression_tests/data/ocean_large_eddy_simulation_SmagorinskyLilly_10010.jld2
9879b0da29c0  709KiB test/regression_tests/data/ocean_large_eddy_simulation_VerstappenAnisotropicMinimumDissipation_10010.jld2
c170cc80cd64  709KiB test/regression_tests/data/ocean_large_eddy_simulation_VerstappenAnisotropicMinimumDissipation_10000.jld2
a5a23cbaaace  709KiB test/regression_tests/data/ocean_large_eddy_simulation_SmagorinskyLilly_10000.jld2
b62c38aea554  709KiB test/regression_tests/data/ocean_large_eddy_simulation_SmagorinskyLilly_10010.jld2
9765742b042b  713KiB test/regression_tests/data/rayleigh_benard_iteration1000.jld2
d6932dc59613  713KiB test/regression_tests/data/rayleigh_benard_iteration1100.jld2
5b796cdfdf8e  718KiB test/regression_tests/data/ocean_large_eddy_simulation_SmagorinskyLilly_iteration10010.jld2
ba4645921310  718KiB test/regression_tests/data/ocean_large_eddy_simulation_SmagorinskyLilly_iteration10000.jld2
3519eeb0dea0  718KiB test/regression_tests/data/ocean_large_eddy_simulation_VerstappenAnisotropicMinimumDissipation_iteration10010.jld2
fbf720bf84dc  718KiB test/regression_tests/data/ocean_large_eddy_simulation_VerstappenAnisotropicMinimumDissipation_iteration10000.jld2
51891abf2cd1  719KiB test/regression_tests/data/ocean_large_eddy_simulation_SmagorinskyLilly_iteration10010.jld2
c48525b35c1b  719KiB test/regression_tests/data/ocean_large_eddy_simulation_SmagorinskyLilly_iteration10000.jld2
41f8e56c345f  719KiB test/regression_tests/data/ocean_large_eddy_simulation_VerstappenAnisotropicMinimumDissipation_iteration10000.jld2
a7a57fa8fdc7  719KiB test/regression_tests/data/ocean_large_eddy_simulation_VerstappenAnisotropicMinimumDissipation_iteration10010.jld2
0ee7298c84ad  731KiB test/thermal_bubble_golden_master_model_checkpoint_10.jld
bddab0c2f590  924KiB test/regression_tests/data/data_rayleigh_benard_regression.jld2
937939cc1ef2  990KiB docs/src/verification/convergence_plots/gaussian_advection_diffusion_solutions.png
841a7461932f  1.0MiB docs/src/verification/convergence_plots/gaussian_advection_diffusion_solutions.png
061ab36b8d44  1.3MiB docs/src/verification/convergence_plots/cosine_advection_diffusion_solutions.png
2f48fac8a7f5  1.4MiB paper/free_convection_and_baroclinic_instability.png
7ef3d2c84f36  1.4MiB docs/src/verification/convergence_plots/cosine_advection_diffusion_solutions.png
glwagner commented 3 years ago

Another use as mentioned by @christophernhill is to host initial condition / state data for examples (which only download if users need to run the examples in question?)

glwagner commented 2 years ago

We do this now.