acep-uaf / aetr-web-book-2024

Alaska Electricity Trends Report as a web book
https://acep-uaf.github.io/aetr-web-book-2024/
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Set-up a large file version control so we can write that big database over and over #30

Closed eldobbins closed 2 months ago

eldobbins commented 3 months ago

In preparation for an Action that recreates the database from the CSV files, set-up DVC to write that file to Drive instead.

eldobbins commented 3 months ago

Asked Will in Slack. He suggested https://git-lfs.com/. Jody concurred.

So we do have LFS available through the university github enterprise account and it actually looks like we have a few repos already using it ... There is apparently an additional cost that comes with using LFS but I'm not sure how much or how that gets charged or if the university just absorbs that.

One other repo is https://github.com/acep-uaf/thearcticprogram.net. It has a .gitattributes that looks like

*.pdf filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text

Jesse says 4 GB file limit. That's OK

eldobbins commented 3 months ago

General notes:

Costs: One data pack costs $5 per month, and provides a monthly quota of 50 GiB for bandwidth and 50 GiB for storage.. and there is 1GB free per account (UAF? ACEP? me?)

About Git Large File Storage: Git LFS cannot be used with GitHub Pages sites.

ianalexmac commented 2 months ago

This seems like a great solution to our binary db headaches. However, the interference between GitHub Pages and LFS grenades this for our use case. The LFS-tracked database is present in the repo directory, but our GH Pages website doesn't seem to recognize it. For shame.

eldobbins commented 2 months ago

@ianalexmac do you think we can close this issue now that we have moved away from the database idea?

We can copy info that we learned here into the wiki.

ianalexmac commented 2 months ago

@eldobbins I think that's a great idea. When we revisit this, I think we should make a separate repo that builds the DB and pushes it to a public GCS storage bucket. Then we can set up an action in this repo to pull the data from the bucket and stash as CSVs. That way, our DB build will be nice and separate (for permissions etc), and this repo will have local CSVs, so will be nice and responsive. The data on the repo will be up to date, and we'll be interfacing with an actual database.

eldobbins commented 2 months ago

Added this discussion to the AETR wiki