How to store and update data?

All the mucking about with my gitstructor script only ended up saving us about 200MB, and we've still got a 400MB .git in a 600MB repo. Files will need updating, but this kinda bloat is really undesirable. What to do?

The data need to sit inside a git repo, and the need to be able to be directly accessed by other repos such as flowlayers. @Robinlovelace Can you see this working with piggyback? Other repos need direct access, not downloading via any piggyback::pb_...() functions, but with piggyback, the data are not actually held directly in the repo. In my admittedly limited vision of things, this would require a who-data repo that was largely empty on github, but locally pulled any modified versions via a series of piggyback calls. The local version would thus differ from the github version in having all files present in the repo and ready to use.

We'd then need some kind of hash system, so I guess could use storr to control the updates. Then each package which used who-data would then make an initial storr call to compare hashes, and in response to any changes would update corresponding files in who-data via a piggyback call. A bit messy, but should result in a tightly inter-woven and stable system for sharing a common repo of potentially very large data.

Alternative

Re-start repo (now that it has a roughly stable structure), allow all other repos to stay as they are, and just periodically use gitstructor to clean things up. I'm not sure I'm in favour of that, because the first solution is likely to be more scalable and (ever our aim here) future-proof.

Thoughts?

ATFutures / who-data-archived-report1

How to store and update data? #2

Alternative