Currently, our class initialization methods are quite massive. They do all of the following things (roughly):
Check that class parameters make sense
Open the raw datasets (lazily or eagerly depending on provided kwarg use_dask)
Prepare raw dataset for further processing including
selecting the relevant times
selecting the relevant fields
making sure that latitude is ascending
concatenating along the longitude if it is a global dataset
choosing a spatial subdomain of the data that covers the ROMS domain (to give better performance in next steps)
Fill in land values
Regrid
Write regridded variables into xarray.Dataset that is accessible via the .ds class attribute
Validate the regridded variables by doing a NaN check on a single time slice (this will always be done eagerly); NaNs over the ocean suggest that the raw data did not cover the full ROMS domain
We should probably split up the initialization method into a number of smaller methods for the following reason: If the user initializes their object with use_dask = False, steps 2-6 will be performed eagerly and could take a long time. However, if the user is not even interested in the computed regridded variables, but for example only wants to do a .to_yaml() call (which would simply write the class kwargs to YAML), that is a lot of wasted resources.
Suggested change
Here is one option for splitting the inizialization class into smaller methods:
Object() # initialization, now only doing step 1
Object.create() # steps 2-6
Object.validate() # step 7
Example use cases
This enables a sequence like
Object(use_dask=False)
Object.to_yaml()
in which case the Object.ds attribute would not be present yet.
A verb like .write() might be better than .create() (because we still create the object even when .create() has not yet been called), but this seems along the right lines.
Status Quo
Currently, our class initialization methods are quite massive. They do all of the following things (roughly):
use_dask
).ds
class attributeWe should probably split up the initialization method into a number of smaller methods for the following reason: If the user initializes their object with
use_dask = False
, steps 2-6 will be performed eagerly and could take a long time. However, if the user is not even interested in the computed regridded variables, but for example only wants to do a.to_yaml()
call (which would simply write the class kwargs to YAML), that is a lot of wasted resources.Suggested change
Here is one option for splitting the inizialization class into smaller methods:
Example use cases
This enables a sequence like
in which case the
Object.ds
attribute would not be present yet.The user could also do things entirely lazily via
before plotting or saving (which requires eager computation).
cc @TomNicholas