Clean up I/O routines - Githubissues

nbren12 commented 4 years ago

Currently, I/O is strewn around this code, base and we keep reinventing the wheel as a result. We are using many different ways of interacting with the filesystem and GCS. These include

gcsfs/fsspec
vcm.cloud.gcs (wrapper routines around the basic google python api)
vcm.cloud.gsutil (wrappers around gsutil)
vcm.cloud.remote_data (some gcsfs and intake).
vcm.cubedsphere.io (manual open_mfdataset and ds.to_netcdf calls). Moreover, combine_subtiles should probably be moved to vcm.combining.
vcm.convenience many I/O functions using a random assortment of utilties. Some unused and some not.
some of the DataFlow pipelines use Apache Beams fileio API, which is similar in flavor to fsspec.
vcm.fv3_restarts uses fsspec.

Obviously this situation is much more complicated than the functionality we need. I think we should develop a coherent API for I/O operations. Some of these are specific to GFDL data (e.g. tiles/subtiles) others are more general purposes (e.g. opening a remote netCDF file lazily). I think intake is designed to solve this problem, but we can always write our own custom solutions.

nbren12 commented 3 years ago

I think vcm.convenience.get_root should be deleted.

nbren12 commented 2 years ago

Still a bit of an issue imo.

ai2cm / fv3net

Clean up I/O routines #88