Closed rsignell-usgs closed 3 years ago
I have been thinking a lot about this problem lately and I would like ask some questions and share some ideas on the matter.
Could we figure out a way to gracefully handle this in Iris using some additional metadata attributes, and if it works well, then propose the metadata attribute solution to CF for more general use?
@rsignell-usgs what kind of metadata do you suggest? If I understand the problem correctly there is more than one way around it: in the structured grid case, for example, some people average (z)eta while others "trim" the grid. Correct? (I am unfamiliar with the solution(s) for the unstructured grid case. Still, it seems to be an user decision as well.)
Using the netcdf_promote
to get all formula terms does not help. The first part of this notebook shows that the reference_terms['ocean_s_coordinate_g1'] = 's c depth_c'
syntax is not working (or am I using it wrong?). I can get either s
, depth_c
or C
, but not all of them together.
Even if that works iris would return the following warnings/exception when using the ocean AuxCoordFactory
on a structured staggered grid:
iris/fileformats/cf.py:1038: UserWarning: Ignoring formula terms variable u'h' referenced by data variable u'v' via variable u's_rho': Dimensions (u'eta_rho', u'xi_rho') do not span (u'time', u's_rho', u'eta_v', u'xi_v')
warnings.warn(msg)
iris/fileformats/cf.py:1038: UserWarning: Ignoring formula terms variable u'zeta' referenced by data variable u'v' via variable u's_rho': Dimensions (u'time', u'eta_rho', u'xi_rho') do not span (u'time', u's_rho', u'eta_v', u'xi_v')
warnings.warn(msg)
iris/fileformats/cf.py:1038: UserWarning: Ignoring formula terms variable u'h' referenced by data variable u'u' via variable u's_rho': Dimensions (u'eta_rho', u'xi_rho') do not span (u'time', u's_rho', u'eta_u', u'xi_u')
warnings.warn(msg)
iris/fileformats/cf.py:1038: UserWarning: Ignoring formula terms variable u'zeta' referenced by data variable u'u' via variable u's_rho': Dimensions (u'time', u'eta_rho', u'xi_rho') do not span (u'time', u's_rho', u'eta_u', u'xi_u')
warnings.warn(msg)
iris/fileformats/_pyke_rules/compiled_krb/fc_rules_cf_fc.py:1388: UserWarning: Gracefully filling 'time' dimension coordinate masked points
warnings.warn(msg.format(str(cf_coord_var.cf_name)))
iris/fileformats/netcdf.py:439: UserWarning: Unable to find coordinate for variable u'zeta'
'{!r}'.format(name))
iris/fileformats/netcdf.py:439: UserWarning: Unable to find coordinate for variable u'h'
'{!r}'.format(name))
ValueError: Unable to construct ocean sigma over z coordinate factory due to insufficient source coordinates.
That breaks the loading of ROMS with iris. (ROMS is one the most popular ocean models.)
In conclusion:
reference_terms
and calculating the z-coord
manually does not seem to be the spirit of iris to me. If I have to do that I would keep using python-netCDF4
and reduce the software dependencies of my scripts.u
and v
).z-coord
ourselves. IMHO the best solution would be something similar to a call_back
function, that would take care of either trimming or averaging (z)eta. That way, whatever the user decides to do with the grid mismatch, it would be an explicit defined function. (I am not familiar to iris call_back
functions and I do not know if this option is possible.)Thoughts?
PS: @rsignell-usgs My suggestions will not add extra metadata nor it will be a solution to CF standards for more general use. But it would get u
, v
, t
and s
as cubes :grin:
@ocefpaf , I agree. Without a way to handle staggered grid in CF, I like your idea of a phased development approach here:
Phase 1: Iris is modified to issue a warning and refuse to build the vertical aux coordinate instead of raising an exception. I imagine this could be implemented quickly.
Phase 2: Iris allows for a callback so that folks can build their own vertical aux coordinate.
Iris devs, what do you think?
Phase 1: Iris is modified to issue a warning and refuse to build the vertical aux coordinate instead of raising an exception. I imagine this could be implemented quickly.
Yes should be simple - possibly as simple as "try...except ValueError: issue warning".
Phase 2: Iris allows for a callback so that folks can build their own vertical aux coordinate.
Makes sense to have it under user control. Is it feasible/sensible to have standard schemes provided by Iris? The API will need some thought.
@rhattersley and @rsignell-usgs I was wondering how easy it would to implement phase 1:
Phase 1: Iris is modified to issue a warning and refuse to build the vertical aux coordinate instead of raising an exception. I imagine this could be implemented quickly.
First I tried to change this line to issue a warning instead of raising a ValueError
:
https://github.com/SciTools/iris/blob/master/lib/iris/fileformats/netcdf.py#L418
Then, as suggest above, I changed here to a try/except case like this:
https://github.com/SciTools/iris/blob/master/lib/iris/fileformats/netcdf.py#L504
try:
_load_aux_factory(engine, cube)
except ValueError as e:
warnings.warn('{}'.format(e))
This approach is hackish but it does succeed in loading all the variables. It also prints the proper warnings when we have missing (or *non-defined) formula terms.
The first block I found to get this working for staggered grids was with a lingering problem when handling missing units attributes. PR https://github.com/SciTools/iris/pull/1106 helped, but when a formula term has missing units (and by missing I mean not even an empty string) iris still throw that term away. See the notebook below:
http://nbviewer.ipython.org/gist/ocefpaf/28647e7bbb75fe81b512
Most datasets we have do not specify any units for formula terms, not even the empty string. I believe that is allowed by the CF rules.
Units are not required for dimensionless quantities. A variable with no units attribute is assumed to be dimensionless. However, a units attribute specifying a dimensionless unit may optionally be included.
Before proceeding I wonder what is iris devs take on this. Should iris load formula terms missing units as dimensionless?
@rhattersley , just want to chime in here that this has been and remains a blocker for python work on our ocean model/data comparison project (we can't compare ocean glider or AUV data without it), so if Phase #1 (replacing the exception with a warning) is relatively easy for Iris devs to implement, we would really appreciate it. I had to give my last presentation on model/data comparison using Matlab! ;-(
I went ahead and removed the formula_terms
units problem by making dimensionless all "unknown" units. That worked fine in my local tests:
http://nbviewer.ipython.org/gist/ocefpaf/28647e7bbb75fe81b512
If iris devs think that is an OK interpretation of the CF-rules I can prepare a PR with that.
Should iris load formula terms missing units as dimensionless?
Yes, I think so! I'm a little surprised that wasn't taken care of in #1106. What did you have to change?
If iris devs think that is an OK interpretation of the CF-rules I can prepare a PR with that.
Yes please - thank you!
@rhattersley sorry! One answer and two more questions before a PR.
What did you have to change?
I hacked into aux_factory.py
to convert only the formula terms from unknown
to dimensionless
:
if coord is not None and not coord.units.is_unknown():
coord.units = iris.unit.Unit("1")
I know that this is probably not the right way to do it. BTW PR #1106 changed the pyke rules to:
attr_units = getattr(cf_var, CF_ATTR_UNITS, iris.unit._UNIT_DIMENSIONLESS)
I do not understand how pyke works and I do not understand why when the units are missing they do not became dimensionless as expect.
Still, iris has the concept of "unknown" units, and things like iris.unit.Unit(None)
and iris.unit.Unit("")
will be interpreted as Unit('unknown')
instead of dimensionless. Therefore I am afraid that, even if we fix the loading variable part, we might have a problem in the future when creating and saving a netCDF file with Unit(None)
or Unit("")
.
My questions are:
1) Does it make sense to get rid of the Unit('unknown')
concept?
2) Or it is better to interpret only the formula terms as dimensionless when the units parameter is missing or it is an empty string?
I do not like (1), but it seems to be what CF does. I kind like (2) because it is common for people to assume that the parameters in the formula terms are just dimensionless numbers unless specified. And I do like the fact that iris loads other variables as unknown when the units property is missing. I believe that, salinity and other non-formula term variables, should always declare explicitly that they are dimensionless (to avoid the confusions like "1" or "0.001" for example).
Note that, if we choose (2), we do not need to change how iris behave right now. We do not even need to convert the units in aux_factory.py
let I did above. All we have to do is to allow for coord.units == 'unknown'
when checking the non-dimension terms.
The CF Conventions are pretty clear: In section 3.1 it says "A variable with no units attribute is assumed to be dimensionless." And because a variable with no units attribute obviously doesn't specify the units, it is implied that the units should be assumed to be "1".
But just in case CF changes it's mind in the future, it would seem prudent to assign units of "1" to variables without units only when those variables occur in the formula_terms
.
I do not understand how pyke works and I do not understand why when the units are missing they do not became dimensionless as expect.
I don't understand either! :expressionless: @bjlittle knows that area so I'll check with him why #1106 doesn't take care of this already. If we can get the netCDF loader to take care of defaulting units to dimensionless then that would be nice. So for now I'm voting for option (3) - sort out #1106 so it works for formula terms.
Looking at an ncdump of the file from the notebook I see a number of units attributes that are empty strings rather than being absent. The getattr will return this empty string attr_units = getattr(cf_var, CF_ATTR_UNITS, iris.unit._UNIT_DIMENSIONLESS)
rather than falling back to the default of dimensionless as the attribute is present. Since iris.unit.Unit("")
is equal to iris.unit.Unit('unknown')
you are getting unknown units. If you remove the units attribute entirely from the cf-netcdf, you'll fallback to dimensionless. The question is, if the units attribute is set to ""
, what does that mean in terms of CF?
Grrr.... I didn't notice before that in all the ROMS models around the internet, the units are actually set to null character ""
. I think in this case we should treat them as "1" also, since they are arguably not defined, and since units of "empty" doesn't make any sense.
The question is, if the units attribute is set to "", what does that mean in terms of CF?
Good question! I was interpreting "" (iris.unit.Unit("")
) as missing as well.
Maybe I should have sent this notebook instead: http://nbviewer.ipython.org/d2c58eb67672594ed5c3
There I test "", missing attr and etc:
I have no answer but this message was floating around the CF mailing list: http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2014/057818.html
So I guess that there is nothing in CF about that.
The question is, if the units attribute is set to "", what does that mean in terms of CF?
The CF conventions state: "The value of the units attribute is a string that can be recognized by UNIDATA's Udunits package". UDUNITS treats an empty string as equivalent to '1'. So unless there's a clear reason to assume that one of unknown/no-unit/missing applies (e.g. the variable contains strings) we should be treating these as '1'. I.e. something more like:
attr_units = getattr(cf_var, CF_ATTR_UNITS, iris.unit._UNIT_DIMENSIONLESS)
if not attr_units:
attr_units = '1'
@rhattersley oh, good! We can be CF-Compliant and fix our problem at the same time! :grinning:
Just tested that and it works! Should I send the PR?
http://nbviewer.ipython.org/gist/ocefpaf/5fa5dc8b81a072389e61/staggered_grid_units_issue.ipynb
I just sent a PR (#1483) with that code. If you are OK with that I will send the next PR changing the error to a warning when constructing the AuxCoord (and that would be the end of phase 1 :sweat_smile: ).
With #1304 merged, phase one is complete! Thank you everyone!
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.
If this issue is still important to you, then please comment on this issue and the stale label will be removed.
Otherwise this issue will be automatically closed in 28 days time.
This stale issue has been automatically closed due to a lack of community activity.
If you still care about this issue, then please either:
@SciTools/iris-devs
who will re-open on your behalf.
We've had a problem for years with CF conventions and "staggered grid" datasets, where the U and V points on a grid element are not co-located with the depth and free-surface elevation points on that element. The result is that the vertical coordinate is not explicitly defined at the U and V points, and CF reading applications get confused and bomb.
Could we figure out a way to gracefully handle this in Iris using some additional metadata attributes, and if it works well, then propose the metadata attribute solution to CF for more general use?
It would be fantastic if we could provide tools in Iris that would allow folks to read data from staggered grids and do the computations automagically to do things like provide vertical coordinates for a section of velocity extracted from one of these models.
Here are examples of staggered grids.