catalyst-cooperative / pudl-catalog

An Intake catalog for distributing open energy system data liberated by Catalyst Cooperative.
https://catalyst.coop/pudl/
MIT License
9 stars 2 forks source link

Allow local file caching to be disabled when appropriate #6

Closed zaneselvans closed 2 years ago

zaneselvans commented 2 years ago

Local file caching is via simplecache:: is hugely valuable when you have a lot of cheap disk and a slower net connection (WFH),but it's not necessarily appropriate in a cloud computing context (e.g. our JupyterHub or CI/CD) where the network is extremely fast, there are no data egress fees, and fast disk is more likely to be constrained.

If we are going to use our Intake data catalog as a primary means of accessing versioned, processed data, the user should be able to turn off caching when appropriate. Is this as easy as not setting PUDL_INTAKE_CACHE so there's no designated location for the cache? Or can it / should it be set explicitly in the arguments to the data source?

zaneselvans commented 2 years ago

Fixed in https://github.com/catalyst-cooperative/pudl-catalog/commit/7fb38ffcd4399b50fabc2f060bca31cde92ef91d