Closed eldobbins closed 2 months ago
Asked Will in Slack. He suggested https://git-lfs.com/. Jody concurred.
So we do have LFS available through the university github enterprise account and it actually looks like we have a few repos already using it ... There is apparently an additional cost that comes with using LFS but I'm not sure how much or how that gets charged or if the university just absorbs that.
One other repo is https://github.com/acep-uaf/thearcticprogram.net. It has a .gitattributes that looks like
*.pdf filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
Jesse says 4 GB file limit. That's OK
aetr.db
with prices in it is 400 KBGit is not designed to handle large SQL files. To share large databases with other developers, we recommend using a file sharing service.
git filter-repo
command. (see sensitive files)Costs: One data pack costs $5 per month, and provides a monthly quota of 50 GiB for bandwidth and 50 GiB for storage.
. and there is 1GB free per account (UAF? ACEP? me?)
About Git Large File Storage: Git LFS cannot be used with GitHub Pages sites.
This seems like a great solution to our binary db headaches. However, the interference between GitHub Pages and LFS grenades this for our use case. The LFS-tracked database is present in the repo directory, but our GH Pages website doesn't seem to recognize it. For shame.
@ianalexmac do you think we can close this issue now that we have moved away from the database idea?
We can copy info that we learned here into the wiki.
@eldobbins I think that's a great idea. When we revisit this, I think we should make a separate repo that builds the DB and pushes it to a public GCS storage bucket. Then we can set up an action in this repo to pull the data from the bucket and stash as CSVs. That way, our DB build will be nice and separate (for permissions etc), and this repo will have local CSVs, so will be nice and responsive. The data on the repo will be up to date, and we'll be interfacing with an actual database.
In preparation for an Action that recreates the database from the CSV files, set-up DVC to write that file to Drive instead.