generate_time_averages - Githubissues

ilaflott commented 1 year ago

I think this is in a "usable enough" state to begin reviewing. There's some fluff in here I need to trim, some mess to tidy up, etc... but the major parts are in place. Here's the rundown of the big pieces:

1. conda environment changes/README

First I needed to understand the environment requirements for running my module, so i set about including the required packages in the environment.yaml/re-building/re-testing. After a few cycles of this I ran into some issues running tests I defined for generate_time_averages and not seeing changes I made to my module reflected in the results.

I realized pytest was using a cached, old version of my module in my .local folder, which is generally touched by pip install. I was able to figure out that after conda env create --file environment.yml, the pip install . step is redundant if one includes all the necessary packages in the conda recipe. By avoiding that step, .local doesn't get any cached copies of fre_python_tools modules, and we can avoid module-import confusion in the conda recipe.

I edited the miniconda recipe in the README accordingly. Note that this changes the way we should call pytest, and we should do so as a module, like python -m pytest tests/ from the root directory (fre-python-tools).

2. fre_python_tools/generate_time_averages, the module itself, of course

There's a few different functions here. generate_cdo_timavg is the focus, and the most flexible. generate_nco_timavg does not work and will be removed. generate_frenctools_timavg does work in theory, but without a conda recipe in a specific channel, cannot be used in a portable manner (though i hear this is in the works). generate_frepythontools_timavg only does a full average of a variable, i.e. using all time points available in the targeted netcdf file, and is very inflexible, though it's at least actually a python-native routine.

3. corresponding tests for generate_time_averages

I have some bash scripts that generate-time-averaged output via CLI tools to compare to, and that is their only purpose. There's some basic testing to check for the existence of that output, check for the directory containing files for inputs, check that the pytest call is from the correct directory, and that generate_cdo_timavg creates files when asked to do so.

ilaflott commented 1 year ago

After pruning and cleaning up, I want to summarize the changes by-file to facilitate a good review.

module stuff in fre_python_tools/generate_time_averages/

The big part, generate_time_averages.py

For command-line usage, which calls the script as __main__, executes the bottom few lines, which then has a corresponding entry-point into the execution of the main function with the argument parser.

For calling the script as an imported module, expected use is via the generate_time_average function, which demands a specific package for the computation approach, an input file, an output file, and the type of averaging requested. Currently, one can average together monthly info, seasonal info, or all available info.

rose-app.conf, __init__.py and README.md

The rose configuration is for Canopy integration, the init file for python import schema. The README is a barebones one that simply describes what generate_time_averages.py does from a birds-eye-view.

conda stuff

significant edits to environment.yml

made after playing around with the conda-only recipe a bit and realizing the pip-install step was unnecessary, and perhaps "dangerous". The python-cdo package requirements in conda-forge channel don't grab the right cdo, if at all, so it must be included here explicitly. I moved the pip-installs of pytest and metomi.isodatetime into conda-packages in the yml, but left pip as a user whom uses fre-python-tools may very well want to use pip to try something out.

small edits to meta.yaml and setup.cfg 3 new lines to tell pip/pytest etc. that generate_time_averages is an importable python module within fre-python-tools.

testing stuff

tests/test_generate_time_averages.py

A basic, first-pass set of tests I came up with to guide further development down-the-road. reads test input files (described below). I haven't done it yet, but I'd like to include some tests that check the time-averaged #'s against those of the CLI tool to verify that no new behavior is introduced beyond what we want.

timavgs_CLI_examples.sh and run_timavgs_CLI_examples.sh in tests/

Two bash scripts that produce time averages via the CLI tool for comparisons against the python-produced time-averages. The netCDF output is already included in the repo, but the scripts to recreate them are included as well out of caution. Short and sweet to go over, I hope!

README.md and five netCDF files in tests/time_avg_test_files/*.nc

The README simply states what the directory is for (test file input + test output files). Of the netCDF files here, two of these are input test data to produce time-averages from, and the other three are time-averages produced via the cdo CLI tool for comparison averages produced via the generate_time_averages python module. there shouldn't be anything too surprising here.

ceblanton commented 1 year ago

I couldn't get the native timeaverager to work using the test input dataset. It appeared to work, but the resulting file was empty.


(fre-python-tools-dev) c2b:~/git/fre-python-tools%>generate-time-averages tests/time_avg_test_files/atmos.197901-198312.LWP.nc out3.nc
calling generate time averages for file: tests/time_avg_test_files/atmos.197901-198312.LWP.nc
calling generate_frepythontools_timavg for file: tests/time_avg_test_files/atmos.197901-198312.LWP.nc
var=LWP
lon # 1/288
^Clon # 33/288
lon # 65/288
lon # 97/288
lon # 129/288
lon # 161/288
lon # 193/288
lon # 225/288
lon # 257/288

(fre-python-tools-dev) c2b:~/git/fre-python-tools%>ncdump -h out3.nc
netcdf out3 {
}

ilaflott commented 1 year ago

differences in #'s seen/confirmed! working on the fre-py-tools version functionality too.

Thought that was worked out. Time for another test...

ilaflott commented 1 year ago

I couldn't get the native timeaverager to work using the test input dataset. It appeared to work, but the resulting file was empty.

(fre-python-tools-dev) c2b:~/git/fre-python-tools%>generate-time-averages tests/time_avg_test_files/atmos.197901-198312.LWP.nc out3.nc
calling generate time averages for file: tests/time_avg_test_files/atmos.197901-198312.LWP.nc
calling generate_frepythontools_timavg for file: tests/time_avg_test_files/atmos.197901-198312.LWP.nc
var=LWP
lon # 1/288
^Clon # 33/288
lon # 65/288
lon # 97/288
lon # 129/288
lon # 161/288
lon # 193/288
lon # 225/288
lon # 257/288

(fre-python-tools-dev) c2b:~/git/fre-python-tools%>ncdump -h out3.nc
netcdf out3 {
}

output now written out as it should be. along with dimensions and variable metadata.

side bonus- i figured out how to bring the runtime down to minutes.

new test for the fre_python_tools averager added, all tests currently passing.

still looking into differences between the final averages between cdo, fre-nctools, and this approach.

ilaflott commented 1 year ago

As of latest commit, I still do not know why cdo doesn't reproduce fre-nctool's timavg.csh's answers. However, the fre-python-tools version does seem to bitwise reproduce fre-nctool's answer.

ilaflott commented 1 year ago

To check my claim, grab the latest changes, run pytest first to produce the fre-python-tools avgs, then run the CLI examples in examples to produce the fre-nctools ones and a ncdiff output file with the fre-python-tools avgs.

NOAA-GFDL / fre-python-tools

generate_time_averages #19