NCAR / ADF

A unified collection of python scripts used to generate standard plots from CAM outputs.
Creative Commons Attribution 4.0 International
36 stars 30 forks source link

Need to perform common calculations once in a single location #133

Open nusbaume opened 2 years ago

nusbaume commented 2 years ago

New feature type

New infrastructure or infrastructure enhancement

What is this new feature?

Right now many plotting scripts repeat various common calculations, such as vertical interpolation and seasonal averaging. This results in redundancies that hurt the performance of the ADF, and also make it difficult to modify these sorts of calculations in the future (as they are spread across multiple different python files). Thus it would be better if these calculations were done in a single location or set of locations before any plotting or statistical analyses are done, such as during the re-gridding or climatology generation phase.

Assistance required?

No, I will make a PR when the feature is ready

Extra info

This may be best done by adding new averaging and regridding scripts to the ADF that do these calculations in a modular and easily editable fashion before the plotting and analysis scripts are run.

andrewgettelman commented 2 years ago

I'm going to throw a big note of caution here. There is a tradeoff between the efficiency that Jesse is discussing and modularity and ease of code development. What happens when data are not just monthly means? I don't think the ADF infrastructure should do too much work. Fine if we want to make some of our core plotting work with this, but we should not require it. Think what happens with other use cases.

The integrated calls where there are scripts that call other functions make development more complicated (in my experience).

Fine to make recommended functions/scripts available if plots want to use them.

nusbaume commented 2 years ago

The way I see it the only new addition to the ADF infrastructure is one new variable that records the different available types of model data (monthly history files, monthly time series, monthly climatologies, seasonal climatologies, climatologies on pressure-levels, etc.) and where they are located, which is likely needed to implement daily and sub-daily model data usage anyways.

Otherwise all of the actual calculations will be done in separate averaging and re-gridding scripts like what is done currently. Then the plotting scripts are simply expected to check this new model data variable to see if the model data they want is available.

TL/DR: if a user wants seasonal climatologies or pressure-level model data they either need to calculate it themselves in their plotting scripts OR add the associated averaging/re-gridding script to the relevant script list. Either way it's a responsibility of the user and not the ADF itself (with one user-defined way being more computationally efficient).