deployment-gap-model-education-fund / deployment-gap-model

ETL code for the Deployment Gap Model Education Fund
https://www.deploymentgap.fund/
MIT License
6 stars 2 forks source link

Extract archived gridstatus data #269

Open bendnorman opened 1 year ago

bendnorman commented 1 year ago

Once the data is archived, we'll need to extract it in the ETL!

We'll want to pin the archive version for each ISO queue our ETL doesn't break when the gridstatus API or data changes. We can have a dictionary that maps the ISO to the version:

ARCHIVE_VERSIONS = {
   "caiso": 1680655040332264,
   "nyiso": 1680655040332298,
   ...
}

We can have some logic that looks in the /data/raw/gridstatus/isoqueue directory to see if the version exists, if not download it, if it does exist, just read the local file.

Some code I experimented with for accessing archived versions of GCS objects. GCS python docs.

from google.cloud import storage
import google.auth
from google.cloud import storage
from google.cloud.storage.blob import Blob

bucket_url = "object-versioning-test.catalyst.coop"
bucket = storage.Client(credentials=credentials).bucket(bucket_url, user_project=project_id)

blob_name = "test.csv"
blob = bucket.blob(blob_name, generation=1680655040332264)
blob.download_as_string()