Currently, I/O is strewn around this code, base and we keep reinventing the wheel as a result. We are using many different ways of interacting with the filesystem and GCS. These include
gcsfs/fsspec
vcm.cloud.gcs (wrapper routines around the basic google python api)
vcm.cloud.gsutil (wrappers around gsutil)
vcm.cloud.remote_data (some gcsfs and intake).
vcm.cubedsphere.io (manual open_mfdataset and ds.to_netcdf calls). Moreover, combine_subtiles should probably be moved to vcm.combining.
vcm.convenience many I/O functions using a random assortment of utilties. Some unused and some not.
some of the DataFlow pipelines use Apache Beams fileio API, which is similar in flavor to fsspec.
vcm.fv3_restarts uses fsspec.
Obviously this situation is much more complicated than the functionality we need. I think we should develop a coherent API for I/O operations. Some of these are specific to GFDL data (e.g. tiles/subtiles) others are more general purposes (e.g. opening a remote netCDF file lazily). I think intake is designed to solve this problem, but we can always write our own custom solutions.
Currently, I/O is strewn around this code, base and we keep reinventing the wheel as a result. We are using many different ways of interacting with the filesystem and GCS. These include
vcm.cloud.gcs
(wrapper routines around the basic google python api)vcm.cloud.gsutil
(wrappers aroundgsutil
)vcm.cloud.remote_data
(some gcsfs and intake).vcm.cubedsphere.io
(manualopen_mfdataset
andds.to_netcdf
calls). Moreover,combine_subtiles
should probably be moved tovcm.combining
.vcm.convenience
many I/O functions using a random assortment of utilties. Some unused and some not.fileio
API, which is similar in flavor tofsspec
.vcm.fv3_restarts
usesfsspec
.Obviously this situation is much more complicated than the functionality we need. I think we should develop a coherent API for I/O operations. Some of these are specific to GFDL data (e.g. tiles/subtiles) others are more general purposes (e.g. opening a remote netCDF file lazily). I think intake is designed to solve this problem, but we can always write our own custom solutions.