COSIMA / cosima-cookbook

Framework for indexing and querying ocean-sea ice model output.
https://cosima-recipes.readthedocs.io/en/latest/
Apache License 2.0
58 stars 25 forks source link

Re-factor database extension and support cell_methods in ExperimentExplorer #274

Closed aidanheerdegen closed 3 years ago

aidanheerdegen commented 3 years ago

Re-factor database extension and support cell_methods in ExperimentExplorer.

The re-factor has been on the cards for a while, and seemed like the best time to dive in.

Added hybrid properties model and is_restart to to NCFile. model infers the realm of the model from the ncfile path. is_restart is a boolean to identify restart variables and also inferred from the ncfile path.

Added hybrid property is_coordinate to CFVariable. This returns a boolean value which is True if this is a coordinate, False otherwise. It is inferred from the value of units.

All hybrid properties are accessible as a python method from a returned object and as a column in a query.

This is moving logic from explorer into the classes themselves to simplify and speed up using the explorer GUI elements.

Added inferred properties to get_variables when argument inferred is True.

get_variables can now return all variables from all experiments if experiment is None. In this case it will also return the experiment name.

Updated tests, but far from comprehensive.

Closes #273

aidanheerdegen commented 3 years ago

The circleci check is just timing out, so can be ignored for the moment.

Still haven't actually added the cell_methods selector I realised, so will turn back into a draft ....

aidanheerdegen commented 3 years ago

This is a working version but still have a few things to work on:

  1. Need more tests, covering existing functionality and new cell methods addition
  2. Too slow to start Experiment Explorer (15s) and Database explorer (22s CPU, 60s Walltime)
  3. Sluggish when selecting/deselecting filtering coordinates and restart vars. Maybe triggering some event loop stuff?
  4. Selecting cell methods works, but the time range is for all variables with the same frequency. Currently have no way to query for a time range based on a var + frequency + cell_methods, so don't have any way to update the time range. So it can be (almost certainly is) misleading.
aidanheerdegen commented 3 years ago

New approach fixes the slow start-up times for the DatabaseExplorer and ExperimentExplorer

> %%time
> dbx = DatabaseExplorer(session=session)
CPU times: user 16.3 s, sys: 1.04 s, total: 17.3 s
Wall time: 17.3 s

> %%time
> ee = ExperimentExplorer(session=session, experiment='01deg_jra55v140_iaf')
CPU times: user 6.58 s, sys: 498 ms, total: 7.08 s
Wall time: 7.07 s

Just as a data point for later comparison the current COSIMA master DB is indexing 115 experiments containing 379590 individual files, 9.2M variables with 56M individual attributes.

No longer slow to filter restarts and coordinates.

aidanheerdegen commented 3 years ago

what's the python version that conda-analysis3 runs?

3.9

aidanheerdegen commented 3 years ago

21.04 uses 3.8. I don't think it is a big deal. It is some weird pytest related error. Doesn't happen outside of testing AFAICT.

aidanheerdegen commented 3 years ago

I think this is ready for review now @angus-g if you have time.

angus-g commented 3 years ago

I'll take a look now!

aidanheerdegen commented 3 years ago

Thanks for checking and making those changes.

Regarding tests, if I'm honest no there aren't enough, but I ran out of motivation as creating them is a bit painful. I have added a test to make sure the hybrid model property is consistent between the two methods of invocation.

I'll take a look at the cell_methods stuff to make sure there is appropriate test coverage.

aidanheerdegen commented 3 years ago

I take that back, clearly a more motivated me added a fair bit of test coverage for cell_methods. I think I'm happy to merge if you are @angus-g.