We just have a documentation burden on how to explain load_layers vs open_dataset vs open_mfdataset and how MLDataset.load/Dataset.load are typically not called but can be used to convert from lazy to eager.
I took some notes that could be the start of documentation on file loading choices (a bit of thread drift here into documentation needs):
I have one file, so I use load_layers for one file:
NetCDF
HDF4 / HDF5
GeoTiff
Grib
Questions:
What if my "one file" for a NetCDF is actually a URL for an OpenDAP endpoint? Is load_layers broken in this case? We should have this working as in xarray.open_dataset with a NetCDF URL.
I have about 8 or 15 GeoTiffs where each is a separate satellite band:
Pass the directory of them as "filename" to load_layers, use LayerSpec to define which bands you want in that directory
I have many files that can be loaded with xarray.open_mfdataset:
Just use xarray.open_mfdataset. This supports NetCDF, Grib. Not sure about HDF5? Then call MLDataset(dset)
Here's the signature for open_dataset. Maybe we want to walk through how some of the arguments compare to arguments of load_layers
filename_or_obj: Same in load_layers but it may be a directory in the case of GeoTiffs for load_layers
group : This group argument for open_dataset is called layer_spec in load_layers. A list of LayerSpec objects can control which groups are loaded and the load_layers style of LayerSpec applies to all the file types, not just NetCDF.
decode_cf : We should make sure load_layers decodes (True) according to CF conventions, equivalent to passing decode_cf=True.
mask_and_scale : This would be nice to support, but not critical. Here's the help:
If True, replace array values equal to _FillValue with NA and scale values according to the formula original_values * scale_factor + add_offset, where _FillValue, scale_factor and add_offset are taken from variable attributes (if they exist). If the _FillValue or missing_value attribute contains multiple values a warning will be issued and all array values matching one of the multiple values will be replaced by NA.
@PeterDSteinberg's discussion in PR https://github.com/ContinuumIO/xarray_filters/pull/44#discussion_r150029450:
We just have a documentation burden on how to explain load_layers vs open_dataset vs open_mfdataset and how MLDataset.load/Dataset.load are typically not called but can be used to convert from lazy to eager.
I took some notes that could be the start of documentation on file loading choices (a bit of thread drift here into documentation needs):
I have one file, so I use load_layers for one file:
NetCDF HDF4 / HDF5 GeoTiff Grib Questions:
What if my "one file" for a NetCDF is actually a URL for an OpenDAP endpoint? Is load_layers broken in this case? We should have this working as in xarray.open_dataset with a NetCDF URL. I have about 8 or 15 GeoTiffs where each is a separate satellite band:
Pass the directory of them as "filename" to load_layers, use LayerSpec to define which bands you want in that directory I have many files that can be loaded with xarray.open_mfdataset:
Just use xarray.open_mfdataset. This supports NetCDF, Grib. Not sure about HDF5? Then call MLDataset(dset) Here's the signature for open_dataset. Maybe we want to walk through how some of the arguments compare to arguments of load_layers
filename_or_obj: Same in load_layers but it may be a directory in the case of GeoTiffs for load_layers
group : This group argument for open_dataset is called layer_spec in load_layers. A list of LayerSpec objects can control which groups are loaded and the load_layers style of LayerSpec applies to all the file types, not just NetCDF.
decode_cf : We should make sure load_layers decodes (True) according to CF conventions, equivalent to passing decode_cf=True.
mask_and_scale : This would be nice to support, but not critical. Here's the help: