SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
635 stars 283 forks source link

Sanitization of degree variables #6216

Open znichollscr opened 3 days ago

znichollscr commented 3 days ago

📚 Documentation

I read a file with iris that had units of "degrees_north". After the file was read, the units were simply "degrees". I believe the line that does this is https://github.com/SciTools/iris/blob/d1125eb613e3c49c4128e004284e572ad337cad1/lib/iris/fileformats/_nc_load_rules/helpers.py#L906. I tried to find a reference/explanation of this in the docs but couldn't, would it be possible to add such an explanation? I assume it is something like, "Irrespective of the exact degrees unit used, we convert to the much simpler 'degrees' as this simplifies plotting and unit handling."

trexfeathers commented 1 day ago

Thanks for getting in touch @znichollscr. The level of detail here is beyond what we would want in the documentation, so contacting us on GitHub is exactly what we would like people to do 👍

It would be good for us to make this clearer in the documentation: #6218

stephenworsley commented 1 day ago

From @SciTools/peloton - I'm afraid can't say for certain what the motivation for this logic is since it was implemented 12 years ago, before most of us were in the team (@bjlittle, @pp-mo are you aware of how this decision was made?). I believe the idea here is due to the fact that UDUNITS has multiple names for degrees (degrees_n, degrees_north) which would complicate code too much if they were conserved throughout, at various points in the code, iris would check if units are degrees and behave accordingly. There may well be an argument for conserving this information, though that may be a bit bit of a breaking change at this point. I could see an argument for sanitising both "degrees_north" and "degrees_n" to both be something like "degrees_north" as that simplifies while conserving relevant information. At any rate, a better code comment would be nice to have so we don't lose this information to future generations.

bjlittle commented 1 day ago

Jeez, that's an old decision going back some ...

If I recall, we'd seen a whole spectrum of permutations for this unit from degrees, degrees_n, degrees_north, degree_n, degree_north et al.

Essentially we wanted to rationalise this and opted for degrees for both latitude and longitude. There may have been the caveat that udunits2 didn't parse the south permutations out the box e.g.,

> udunits2 -H degrees_north -W degrees
    1 degrees_north = 1 degrees
    x/degrees = (x/degrees_north)

> udunits2 -H degrees_east -W degrees
    1 degrees_east = 1 degrees
    x/degrees = (x/degrees_east)

> udunits2 -H degrees_west -W degrees
    1 degrees_west = -1 degrees
    x/degrees = -1*(x/degrees_west)

> udunits2 -H degrees_south -W degrees
udunits2: Don't recognize "degrees_south"

That still seems to oddly be the case, I guess.

So, yes, it was a pragmatic decsion from what I recall.

@pp-mo Anything to add here?

znichollscr commented 1 day ago

Hi all, thanks for the insights, super helpful. The decision to sanitise does make lots of sense. It does break the round-tripping I'm doing, but my use case is a super edge case in my opinion and the fix on my side is four lines so I'd suggest that there isn't anything to be done on the iris side. I'll just link to this issue in the fix and we can all move on with life. Cheers!

(Feel free to close this issue, but also fine if you want to keep it open for future tracking)