Is your change request related to a problem? Please describe.
It is relatively painless to read the EFTS format from netcdf and wrangle it to a pleasant xarray to use. However creating a new xarray data set proves rather awkward, at least but not only when following a typical creation workflow as it was done with netcdf4 bindings in R and matlab.
All values of the data dimensions need to be known upfront and created before the dataset. Not entirely a problem, but this is a straight jacket.
To write coordinate values after data creation you cannot set the numpy values; you have to sue assign_coords, which creates a while new dataset in memory SFAICS. This is already potentially a problem when loading from disk then overriding the station name dimensions and date/times indices, but at least this could/would be lazy loaded. When doing so on in memory datasets, I foresee a fairly large penalty depending on how it is done.
The big question mark is whether xarray is a good idea for cases down the track when we need to write to disk while using (large data set). May or may not need to be in scope however these days with RAM availability. Still, science has a way of filling available memory.
Describe the solution you'd like / Describe alternatives you've considered
Possibly, use the netcdf4 bindings directly to create a new file on disk, then load to xarray.
Is your change request related to a problem? Please describe.
It is relatively painless to read the EFTS format from netcdf and wrangle it to a pleasant xarray to use. However creating a new xarray data set proves rather awkward, at least but not only when following a typical creation workflow as it was done with netcdf4 bindings in R and matlab.
All values of the data dimensions need to be known upfront and created before the dataset. Not entirely a problem, but this is a straight jacket.
To write coordinate values after data creation you cannot set the numpy values; you have to sue
assign_coords
, which creates a while new dataset in memory SFAICS. This is already potentially a problem when loading from disk then overriding the station name dimensions and date/times indices, but at least this could/would be lazy loaded. When doing so on in memory datasets, I foresee a fairly large penalty depending on how it is done.The big question mark is whether xarray is a good idea for cases down the track when we need to write to disk while using (large data set). May or may not need to be in scope however these days with RAM availability. Still, science has a way of filling available memory.
Describe the solution you'd like / Describe alternatives you've considered
Possibly, use the netcdf4 bindings directly to create a new file on disk, then load to xarray.