ammarhakim / gkylzero

Lowest, compiled layer of Gkeyll infrastructure.
MIT License
22 stars 5 forks source link

[DR] Move all data out of GkeyllZero repo #524

Open ammarhakim opened 1 week ago

ammarhakim commented 1 week ago

The goal of this very brief Design Review is to propose that we move out all data from G0 repo and move it into its own repo. There are many reasons for doing this:

  1. Random data is being added to the G0 repo at present in a haphazard way, making it harder to see what data is essential and what was added for testing reasons and is no longer needed
  2. Diffs with large binary data in a code repo becomes slower and slower
  3. A separate repo would allow proper installation of needed data so main G0 can work in the case that data is needed

For this, I am recommending a new repo in which we have specialized code to download data (atomic physics, for example) and also other static data. This repo would be installed using mkdeps in G0 (optionally).

Those who use ADAS and need other data will need to be involved so we do this properly and not disrupt our present workflow too much.

JonathanGorard commented 1 week ago

I agree wholeheartedly with this suggestion - we should emphatically not be using standard version control for the files in the G0 /data directory, because standard Git is optimized for versioning code, not binary or data files, and we don't want to be unnecessarily slowing down our code diffs long-term as the G0 repository continues to grow.

A reasonable solution for the separate data repository would be to use Git LFS, which is specifically optimized for the versioning of binaries and other random data files. It should be possible for us to transfer the version control history from the G0 code repository into the new repo when we make this transition.