CWorthy-ocean / roms-tools

Tools for setting up and running ROMS simulations
https://roms-tools.readthedocs.io
GNU General Public License v3.0
9 stars 4 forks source link

Split up initialization method #144

Open NoraLoose opened 4 hours ago

NoraLoose commented 4 hours ago

Status Quo

Currently, our class initialization methods are quite massive. They do all of the following things (roughly):

  1. Check that class parameters make sense
  2. Open the raw datasets (lazily or eagerly depending on provided kwarg use_dask)
  3. Prepare raw dataset for further processing including
    • selecting the relevant times
    • selecting the relevant fields
    • making sure that latitude is ascending
    • concatenating along the longitude if it is a global dataset
    • choosing a spatial subdomain of the data that covers the ROMS domain (to give better performance in next steps)
  4. Fill in land values
  5. Regrid
  6. Write regridded variables into xarray.Dataset that is accessible via the .ds class attribute
  7. Validate the regridded variables by doing a NaN check on a single time slice (this will always be done eagerly); NaNs over the ocean suggest that the raw data did not cover the full ROMS domain

We should probably split up the initialization method into a number of smaller methods for the following reason: If the user initializes their object with use_dask = False, steps 2-6 will be performed eagerly and could take a long time. However, if the user is not even interested in the computed regridded variables, but for example only wants to do a .to_yaml() call (which would simply write the class kwargs to YAML), that is a lot of wasted resources.

Suggested change

Here is one option for splitting the inizialization class into smaller methods:

Object() # initialization, now only doing step 1
Object.create() # steps 2-6
Object.validate() # step 7

Example use cases

This enables a sequence like

Object(use_dask=False)
Object.to_yaml()

in which case the Object.ds attribute would not be present yet.

The user could also do things entirely lazily via

Object(use_dask=True)
Object.create(validate=False) # lazy
Object.create(validate=True) # step 2-6 lazy, step 7 eager

before plotting or saving (which requires eager computation).

cc @TomNicholas

TomNicholas commented 4 hours ago

A verb like .write() might be better than .create() (because we still create the object even when .create() has not yet been called), but this seems along the right lines.