CliMA / ClimaOcean.jl

🌎 Framework for realistic regional-to-global ocean simulations, and coupled ocean + sea-ice simulations based on Oceananigans and ClimaSeaIce. Basis for the ocean and sea-ice component of CliMA's Earth system model.
https://clima.github.io/ClimaOceanDocumentation/dev/
MIT License
29 stars 9 forks source link

Should we use `Scratch` and/or should we provide symlinks to scratch data? #145

Open glwagner opened 2 months ago

glwagner commented 2 months ago

I'd like to start this discussion here. The README for Scratch.jl states:

Because the scratch space location on disk is not very user-friendly, scratch spaces should, in general, not be used for a storing files that the user must interact with through a file browser. In that event, packages should simply write out to disk at a location given by the user.

I think that at least some users will want to interact with the files that configure a ClimaOcean simulation. For example, we may be interested in inspecting the high-resolution bathymetry file that was used to generate bathymetry for a certain test case. Or atmospheric forcing data. At least, that is common for typical workflows in other packages. Possibly, with ClimaOcean this need will be reduced (we will see). But for now I think it's perhaps best to assume that users want to interact with the downloaded files, which means that we probably don't want to use Scratch at the moment.

@simone-silvestri

glwagner commented 2 months ago

I guess a benefit of using Scratch is that users will only have to download huge files / datasets once per filesystem, and then they can use that data in many projects.

Another possibility is to try to do both; ie use Scratch to avoid redownloading but also somehow document the location of the data so users can inspect.

simone-silvestri commented 2 months ago

Right, it is indeed convenient to have the data in the local directory for inspection, so a hybrid implementation might be advantageous.

glwagner commented 2 months ago

For those interested, the way it's used is within Bathymetry.__init__():

https://github.com/CliMA/ClimaOcean.jl/blob/8a229c2a058ea207bf83df54cd33aa49c690faf9/src/Bathymetry.jl#L25-L29

which generates the global variable download_bathymetry_cache (a directory) which is then used here:

https://github.com/CliMA/ClimaOcean.jl/blob/8a229c2a058ea207bf83df54cd33aa49c690faf9/src/Bathymetry.jl#L83

glwagner commented 2 months ago

Right, it is indeed convenient to have the data in the local directory for inspection, so a hybrid implementation might be advantageous.

We could just create a symbolic link in the current working directory?

simone-silvestri commented 2 months ago

I like that solution!