CESR-lab / ucla-roms

GNU General Public License v3.0
3 stars 8 forks source link

Reduce data holdings in this repository #26

Open matt-long opened 3 weeks ago

matt-long commented 3 weeks ago

The size of the ROMS repository is somewhat unwieldy, with large data collections, for example, in Examples/input_data. We're interested in rapidly deploying ROMS and this large size reduces flexibility for cloning patterns.

These datasets are likely to change with a lower temporal frequency than the source code, so we're carrying around a lot of extra overhead.

I propose that we consider a redesign of the data holding, likely pulling them out into a separate ROMS-Inputdata repository, for example, or exploring other solutions (e.g., https://dvc.org/).

dafyddstephenson commented 3 weeks ago

Good idea, Matt. The input_data directory isn't too bad, but I believe the files in it have been changed several times, and git cannot keep track of diffs in netcdf files, so instead keeps multiple versions of very similar files in its history, which are all copied to the local system on git clone. If uploading the files to a fresh repo with a clean history, it would carry a much lower footprint, as long as the files aren't changed going forward. However, the files would need to be deep cleaned out of the current repo's history.