Reading parquet files which are stored on the local filesystem through the current PUDL catalog still results in caching. This slows things down dramatically, and quickly uses an enormous amount of disk space. Especially in development when we've got data that we've just generated locally it could be nice to be working with it using the same mechanism as remote data (the data catalog), but not if we end up with a bunch of unnecessary caching happening continuously in the background.
Identify a way to disable caching when we're working with local data. Ideally this would be done automatically without the user having to think about it. Maybe it's as simple as making the simplecache:: prefix to urlpath conditional based on the value of PUDL_INTAKE_PATH using Jinja templating features?
If that's not possible then maybe caching can be turned off with an argument that's passed to the data source by the user.
Reading parquet files which are stored on the local filesystem through the current PUDL catalog still results in caching. This slows things down dramatically, and quickly uses an enormous amount of disk space. Especially in development when we've got data that we've just generated locally it could be nice to be working with it using the same mechanism as remote data (the data catalog), but not if we end up with a bunch of unnecessary caching happening continuously in the background.
Identify a way to disable caching when we're working with local data. Ideally this would be done automatically without the user having to think about it. Maybe it's as simple as making the
simplecache::
prefix tourlpath
conditional based on the value ofPUDL_INTAKE_PATH
using Jinja templating features?If that's not possible then maybe caching can be turned off with an argument that's passed to the data source by the user.