Data page or Zip file to make data downloads prominent

jikaczmarski commented 3 months ago

We would like to see a page where one could access the data right away.

ianalexmac commented 3 months ago

Perhaps we could keep data in SQLite .db format, then run queries to build CSV downloads.

ianalexmac commented 3 months ago

@dayne do we point data downloads towards https://github.com/acep-uaf/ak-energy-statistics-2011_2021 ?

eldobbins commented 3 months ago

Converting this to an Epic because we need to think about how the data is organized as part of this. Scheduling a kick-off meeting for that discussion in Early April.

ianalexmac commented 3 months ago

From our conversation today @eldobbins , I started to explore data organizations. We talked about three directories, which I've created in /data/.

The data/ directory has been built out with three subdirectories, raw, working, and final. Within each of the three is a markdown file with a brief description of what should be there.

raw/ is for CSVs used to build the database. This directory could easily get extremely messy, it's important to guard against chaos here. In the future, this can be a landing spot for a pipeline script or import from the workbooks located in the other repo, ak-energy-statistics-2011_2021
working/ contains the SQLite database, as well as the code to build it. If all goes well, this folder should be easy to keep clean. Either it's in the database or it should live somewhere else.
final/ contains scripts and files for public-facing products, such as CSV downloads for researchers. The scripts here will extract from the SQLite database and output CSVs. In the future, this will be triggered by updates to the database and run via an action.

At the moment, I have price tables and a few capacity tables in the database. The page prices.qmd is running on the database. capacity.qmd could follow suit, but will need a little tweaking for derived tables and the like. @jikaczmarski , we should chat about this soon.

I'm pivoting to think about code to generate CSV files from the database and make download links. You can see a window into the database on the new data page (live, but not linked in the sidebar, so not quite public).

None of this is permanent, and I'm really looking forward to more talk about organization and workflow.

eldobbins commented 3 months ago

I like this general structure. Could you have subdirectories in raw/ for generation, price, capacity?

ianalexmac commented 3 months ago

It turns out .db and .zip files are both binary format, so not ideal to host on a repo. There was talk of hosting the db on Google Drive, but we may run into permissions issues? The script that builds the database from raw files needs to have write permissions, while the scripts that render the webpage should not have write permissions, correct?

It seems like a good idea to have an action watch the raw data directory and rebuild the database when changes are made. And if we're going to have a zip of all tables, we need that to rebuild upon changes to the database.

It feels like we're slow walking towards a rudimentary pipeline with pub/sub actions and maybe an ephemeral VM for building out the database and zip. GCP rocks for this sort of stuff, but I need to upskill in order to set it up. I'd like to expand my skills in this direction anyways, so it might be the perfect time to learn? @jikaczmarski sounds interested too!

eldobbins commented 3 months ago

See comment on #31 that uses a GitHub action to make a zip file. So creation of a ZIP file is a reasonable thing to do or GitHub would not allow this action.
Also discussion via GChat where we pretty-much decided to put off making a DB file until we had worked out the infrastructure for it
Also, a new series of issues was created in https://github.com/acep-uaf/ak-energy-statistics-2011_2021 that should make that repo suitable for linking to, with the caveat that it is "intermediate" data

Potential new directory layout

data/
- raw/ README with a link to https://github.com/acep-uaf/ak-energy-statistics-2011_2021 as "intermediate" data. Plus any files that get reformatted before they are used for plots
- working/ - divided into categories because they were all treated in separate manners = data used for plots
- aetr_data_package.json - metadata file begun in #33 and expanded to document all the files (subdirectories allowed)
- capacity/ - all the CSVs for this section
- consumption/ - all the CSVs for this section
- generation/ - all the CSVs for this section
- prices/ - all the CSVs for this section
- final/
- aetr_2024_data.zip - a Zip file generated from working by the GitHub Action in #31
- aetr_2024_data.db - in the future we can play with this, but not in this version. or maybe it has a different generation mechanism completely
scripts/ - this directory exists but we aren't using it yet. Can move the R scripts here for tidiness
- reformat/ - scripts that reformat data (price has some, right?)
- plot/ - scripts that make plots

ianalexmac commented 2 months ago

There was a lot of discussion about this topic yesterday. Highlights include:

instead of zipping data directory, use tag and release to zip entire repo for distribution
- license and other important files ride along
- researchers have expressed interest in this workflow
LFS is a great place to store the .db file
- HOWEVER! It looks like LFS can not be used with GH Pages, read more here
- the above could explain why the LFS-hosted .db file breaks my code
  - at this point, the database may be more headache than it's worth, so pivot back to using CSVs

ianalexmac commented 2 months ago

The data page now has table previews and CSV downloads for the 4 tables that we're currently using to generate the visuals.

ianalexmac commented 2 months ago

@jikaczmarski @eldobbins We're at a stopping point on the data page. We could either close this issue or regroup and decide on changes/features (minus #39, adding a metadata parser and corresponding links).

eldobbins commented 2 months ago

Two more items to do:

restrict data downloads to before 2019
make prettier buttons. Here's the section that is a button so we know what CSS to hit.

jikaczmarski commented 2 months ago

Added consumption data to the data portal.

ianalexmac commented 2 months ago

Modified the download buttons to display pretty names instead of file names.
Added second column to display "Download" as markdown alongside download button.

ianalexmac commented 2 months ago

Data page is in fine shape for now. Closing this issue.

acep-uaf / aetr-web-book-2024

Data page or Zip file to make data downloads prominent #18