MetOffice / EUCP_WP5_Lines_of_Evidence

EUCP lines of evidence comparison tool
0 stars 1 forks source link

review of mean_process.py and fix_lambert.py #2

Open nhsavage opened 2 years ago

nhsavage commented 2 years ago

To review the code for correctness

nhsavage commented 2 years ago

Review of mean_process.py

fix_coord_metadata: Please explain more clearly what is wrong with coordinate system for this model and why does rounding to 9 decimal places help?

subtract half an hour to fix section:

Generating output filename: The date section has an else clause with an error. For safety don't we need this with the frequency section?

General nitpicks (probably not worth bothering about for this type of use once code)

nhsavage commented 2 years ago

@thomascrocker

nhsavage commented 2 years ago

review of fix_lambert.py

all seems good. Would be useful to tidy up the comments and redundant code, plus add an explanation of what was wrong with the original files and how it was fixed (by copying from the hourly data - was this ok in the first place, or did you edit this one manually?)

thomascrocker commented 2 years ago

Review of mean_process.py

fix_coord_metadata: Please explain more clearly what is wrong with coordinate system for this model and why does rounding to 9 decimal places help? There were rounding error (machine precision) level differences in the false_easting and false_northing values for this model which led to iris complaints about differing coord systems further down the line when concatenating. Rounding to 9 dp was enough to eliminate them. I decided rounding to that level wouldn't make any meaningful difference to any model so left as is to catch any other models it might have been an issue for.

subtract half an hour to fix section:

* Does this assume that the data is hourly and should end on a month boundary? Do you need to test these assumptions?

Most of the data this script was used to process was hourly. The expected end date was always a month boundary, i.e. files were CMOR like and ended in for example 200512312330.nc i.e. each file being processed represented one calendar year of data. I did also process a small amount of daily data, but I don't think this was ever an issue, with it. If recall rightly the hourly data that it was a problem for had the point value of the time coord representing the end of the time period, rather than the mid point, hence it spilling into the next day (and month) for the final point in the file. Subtracting 30 minutes from the entire points series resolved it.

Generating output filename: The date section has an else clause with an error. For safety don't we need this with the frequency section? Yes, it would be appropriate. If I'm going to run this code again I'll add one.

General nitpicks (probably not worth bothering about for this type of use once code)

* fix_coord_metadata - better style to document purpose in a docstring

* would be better to use more functional docomposition

Yep fair enough. If this was to become more reusable code those are all sensible improvements.

* main code: instead of Globbing - can' t you just do `cubes = iris.load(file_path+'/*', `

I think I might have run into problems when I tried an iris.load and then interation over the cubelist... Either iris tried to merge the data which caused issues (although one can get around that using iris.load_raw) or it could have been that these were hourly data files that were tens of GB in size. Ultimately globbing and processing each file individually kept everything manageable.

thomascrocker commented 2 years ago

review of fix_lambert.py

all seems good. Would be useful to tidy up the comments and redundant code, plus add an explanation of what was wrong with the original files and how it was fixed (by copying from the hourly data - was this ok in the first place, or did you edit this one manually?)

I can't remember exactly.. And unfortunately the server that was hosting the original files has now been shut down (as funding to keep it going has run out or some other project politics). Ultimately, there was a set of data from this particular model that was missing the info of the native grid coordinates and also the coordinate system. Thankfully that data was present in a different set of the hourly files, so the script takes that data and then applies it retrospectively to the other files.