CMIP-Data-Request-coleads / cmip7_request_scripts

3 stars 1 forks source link

Python classes #3

Open CMIP-Data-Request-coleads opened 2 months ago

CMIP-Data-Request-coleads commented 2 months ago

The TISG meeting on Sept 12th discussed the benefits of having code organised around a clearly defined set of python classes.

The request_classes.py script does this in a sense. E.g. it has classes for Record and Table. Each individual table of the request can then be realised as an instance of the Table class. This provides a lot of flexibility. If an extra table is added in Air Table it will be handled automatically. This approach was also used in dreqPy. The flexibility comes with a drawback: it makes the class structure quite abstract; we don't have documentation read-the-docs compatible documentation associated with each instance of the Table class.

On the other hand Matt suggested having a python class per table, e.g. one for "Variable", one for "Opportunity", etc. This might be achieved by converting the request_classes.py Table to be a base class and create a class per table which can be edited an documented. I've created a template here : request_classes.py.template . The risk of this approach is potential divergence between documentation in the code and documentation in the database, but I think there are more advantages, particularly in enabling transparency about different python methods being added to different tables.

matthew-mizielinski commented 1 month ago

Apologies that it has taken me so long to get back to this.

I've had a look over the request_classes.py.template file and this isn't too far off how I was thinking about it, although I would be tempted with a very simple dataclass based structure due to its simplicity when reading the code.

I've re-implemented the software pattern I used for CMIP6 with the JSON file in this repository (was very straightforward with the new structures) in a notebook where I'm only pulling out a few of the attributes for a minimal set of objects along the route between "experiment" and "variable".

The notebook can be found here -- remember that this is illustrative code and for a "real" api there are a number of other steps I would take based on prior experience to ensure we can extract priorities, time slices, etc.

sol1105 commented 1 month ago

I added a notebook that showcases how to manage the DReq content (following up my PR) and adapts @matthew-mizielinski's illustrative example with the PR content and a small function to map objects between bases.