NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
127 stars 19 forks source link

Accessibility of key domain information #771

Open bnlawrence opened 6 months ago

bnlawrence commented 6 months ago

In introducing CF concepts I find I want to introduce three separate ideas (see [attached lecture excerpt])(https://github.com/NCAS-CMS/cf-python/files/15261595/lecture05-cfconcepts.pdf)

While I understand this stuff to some extent, I find it difficult to use the CF-Python machinery to expose their application in actual data files. In particular, e.g. for the "statistics over axes and areas", what's the most elegant method of extracting that information for a given field, domain, and set of coordinate axes? Currently I can get it in the print statement on a field, but I can't extract it in it's own right.

E.g. for this field:

Field: air_temperature (ncvar%tas)
----------------------------------
Data            : air_temperature(time(1980), latitude(143), longitude(144)) K
Cell methods    : area: time(1980): mean
Dimension coords: time(1980) = [1850-01-16 12:00:00, ..., 2014-12-16 12:00:00] gregorian
                : latitude(143) = [-90.0, ..., 90.0] degrees_north
                : longitude(144) = [0.0, ..., 357.5] degrees_east
                : height(1) = [2.0] m
Cell measures   : measure:area (external variable: ncvar%areacella)

Is there an elegant way of extracting air_temperature: cell_methods='area: time : mean' ?

in some method like f.cell_methods4humans()?

bnlawrence commented 6 months ago

(Because I don't think print(f.cell_methods()) is for humans 😄 )

sadielbartholomew commented 6 months ago

Hi @bnlawrence, thanks for your thoughts. I don't have much time but I'll share some general tips for now.

Firstly, in general if the domain is of concern and not other field information, you can use f.domain to access that.

Is there an elegant way of extracting air_temperature: cell_methods='area: time : mean' ?

The todict method is often helpful, to get more information and in particular the 'key' name that can be used to query the object with a construct() or <construct name method>() call such as cell_method() in this case, e.g. (using our f = cf.example_field(1) example field to illustrate):

>>> f.cell_methods()
<CF Constructs: cell_method(2)>
>>> f.cell_methods(todict=True)
{'cellmethod0': <CF CellMethod: domainaxis1: domainaxis2: mean where land (interval: 0.1 degrees)>, 'cellmethod1': <CF CellMethod: domainaxis3: maximum>}
>>> f.cell_method('cellmethod0')
<CF CellMethod: domainaxis1: domainaxis2: mean where land (interval: 0.1 degrees)>
>>> f.construct('cellmethod0')
<CF CellMethod: domainaxis1: domainaxis2: mean where land (interval: 0.1 degrees)>

then you can dig deeper into the object, via in the this case:

>>> a = f.construct('cellmethod0')
>>> a
<CF CellMethod: domainaxis1: domainaxis2: mean where land (interval: 0.1 degrees)>
>>> a.__dir__()  # just to see what methods and properties we have available to us
['_Data', '_components', '__module__', '__doc__', '__new__', '__repr__', 'create', '__hash__', '__eq__', '__ne__', 'within', 'where', 'over', 'comment', 'method', 'intervals', 'axes', 'expand_intervals', 'change_axes', 'equivalent', 'inspect', 'write', 'remove_axes', '__deepcopy__', '__docstring_package_depth__', '__docstring_substitutions__', '__init__', '__str__', '_atol', '_custom', '_default', '_del_component', '_equals', '_equals_preprocess', '_get_component', '_has_component', '_identities_iter', '_iter', '_package', '_rtol', '_set_component', 'construct_type', 'copy', 'creation_commands', 'del_axes', 'del_method', 'del_qualifier', 'dump', 'equals', 'get_axes', 'get_method', 'get_qualifier', 'has_axes', 'has_method', 'has_qualifier', 'identities', 'identity', 'qualifiers', 'set_axes', 'set_method', 'set_qualifier', 'sorted', '__doc_template__', '__dict__', '__weakref__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__gt__', '__ge__', '__reduce_ex__', '__reduce__', '__getstate__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__']
>>> a.axes
('domainaxis1', 'domainaxis2')
>>> a.method
'mean'

(Because I don't think print(f.cell_methods()) is for humans 😄 )

I think it is intended to be the immediate view of the object, not a human-friendly summary. That is what the representation methods (repr/str) are intended for. But let me know if I have misunderstood your comment, here.

sadielbartholomew commented 6 months ago

(David's just briefed me in person on the exact background to this and more specifically what you were unhappy with for the API here. But hopefully the comment above is useful anyway...)