NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
125 stars 19 forks source link

Weighted-mean by area for 1D land-compressed CF Field #731

Closed mcguirepatr closed 8 months ago

mcguirepatr commented 8 months ago

Hi cf-python'ers I am trying to do a global land-only area-weighted average with CF Python of a field that is a 1D land-only array (instead of 2D). I'm using a script that someone else made to get these in the right 1D format instead of the 1D format that JULES makes. But I get an error when I try to do this in CF Python. Can you advise me on this? See below. Patrick

On the CEDA Jasmin supercomputer, I do this:

module load jaspy
python ~pmcguire/cf1dland_v1.py ~pmcguire/for_andyh/JULES2013.nc #I'm attaching the other person's python code[cf1dland_v1.txt](https://github.com/NCAS-CMS/cf-python/files/14462867/cf1dland_v1.txt) on the GitHUB repository as well
cp -p converted.nc ~pmcguire/for_andyh/converted.nc
python
>>> import cf
>>> xx=cf.read('~pmcguire/for_andyh/converted.nc')
>>> xx[0]
<CF Field: long_name=Gridbox surface temperature(time(1), latitude(112), longitude(192)) K>
>>> print(xx[0])
Field: long_name=Gridbox surface temperature (ncvar%Ts)
-------------------------------------------------------
Data            : long_name=Gridbox surface temperature(time(1), latitude(112), longitude(192)) K
Cell methods    : time : mean
Dimension coords: time(1) = [2014-01-01 00:00:00] 365_day
                : latitude(112) = [-55.625, ..., 83.125] degrees_north
                : longitude(192) = [0.9375, ..., 359.0625] degrees_east

print(xx[0].collapse('mean',axes='area',measure='true').array)

[[[281.92977735622253]]] #this seems right for Kelvin, but it isn't area-weighted.

But when I try to do an area-weighted average, I get this error.

>>> print(xx[0].collapse('mean',axes='area',measure='true',weights='area').array)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/decorators.py", line 71, in precede_with_kwarg_deprecation_check
    operation_method_result = operation_method(self, *args, **kwargs)

  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cfdm/decorators.py", line 171, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)

  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/field.py", line 8016, in collapse
    d_weights = f.weights(

  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/field.py", line 4923, in weights
    self._weights_geometry_area(

  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/field.py", line 2237, in _weights_geometry_area
    axis, aux_X, aux_Y, aux_Z = self._weights_yyy(

  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/field.py", line 3036, in _weights_yyy
    raise ValueError(
ValueError: Can't create weights: Need both X and Y nodes to calculate polygon geometry weights
mcguirepatr commented 8 months ago

This is the output when I turn debugging on. Any suggestions? Is there another helpdesk that would work better for this? Patrick

>>>cf.log_level('DEBUG')
>>>print(xx[0].collapse(‘mean’,axes=‘area’,measure=‘true’,weights=‘area’).array)
DEBUG
    all_methods, all_axes, all_within, all_over = ('mean',) [['domainaxis2', 'domainaxis1']] (None,) (None,)
    axes                    = ['domainaxis2', 'domainaxis1']
    method                  = mean
    collapse_axes_all_sizes = Constructs:
{'domainaxis1': <CF DomainAxis: size(112)>,
 'domainaxis2': <CF DomainAxis: size(192)>}
    collapse_axes           = {'domainaxis1': <CF DomainAxis: size(112)>, 'domainaxis2': <CF DomainAxis: size(192)>}
    collapse_axes_sizes     = [112, 192]
    Input weights           = 'area'
Traceback (most recent call last):
  File "/home/users/pmcguire/for_andyh/cf1dland_v2.py", line 207, in <module>
    print(xx[0].collapse('mean',axes='area',weights='area',measure='true').array)
  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/decorators.py", line 71, in precede_with_kwarg_deprecation_check
    operation_method_result = operation_method(self, *args, **kwargs)
  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cfdm/decorators.py", line 171, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/field.py", line 8016, in collapse
    d_weights = f.weights(
  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/field.py", line 4923, in weights
    self._weights_geometry_area(
  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/field.py", line 2237, in _weights_geometry_area
    axis, aux_X, aux_Y, aux_Z = self._weights_yyy(
  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/cf/field.py", line 3036, in _weights_yyy
    raise ValueError(
ValueError: Can't create weights: Need both X and Y nodes to calculate polygon geometry weights
mcguirepatr commented 8 months ago

With embedded print statements, it seems that the _weights_yyy function is being called for some reason with domain_axis=None and geometry_type=polygon. It looks like the next loop over auxiliary_coordinates_1d.items() is not getting entered into (maybe there are no items?). Patrick

mcguirepatr commented 8 months ago

One of the CF-python managers has said: this looks like it might be a cf-python bug (which might be fixed by https://github.com/NCAS-CMS/cf-python/issues/721, publicly released today!). (although it might be something else, too!)

mcguirepatr commented 8 months ago

It looks like on the JASMIN supercomputer, with the module load jaspy, the latest version of cf is 3.15.3. But today's release of cf is 3.17, I think. Do you know when JASMIN will update to 3.17? I will try to clone the cf release for a local installation on JASMIN, to try this out. Patrick

mcguirepatr commented 8 months ago

I tried your 3.16.1 version, which I believe is almost identical to your not-released-yet 3.17 version, based upon your commits in #721 . And the issue I describe here in #731 still isn't fixed by going from (the default on JASMIN's jaspy) 3.15.3 to 3.16.1. But at least my cf code version is more up to date now. Patrick

mcguirepatr commented 8 months ago

I put some print statements in the 3.16.1 code, and in the area_XY( ) function in cf/weights.py, it is saying that xcoord.has_bounds() and ycoord.has_bounds() are both FALSE. So maybe I need to set the bounds somewhere?

I commented out the code that halts area_XY( ) if there are no bounds, and then that leads to a new error message later on: Traceback (most recent call last):

  File "/home/users/pmcguire/for_andyh/cf1dland_v2.py", line 208, in <module>
    print(xx[0].collapse('sum',axes='area',weights='area').array)
  File "/home/users/pmcguire/for_andyh/cf/mixin/propertiesdata.py", line 2469, in array
    return data.array
  File "/home/users/pmcguire/for_andyh/cf/data/data.py", line 5261, in array
    array = self.compute().copy()
  File "/home/users/pmcguire/for_andyh/cf/data/data.py", line 2781, in compute
    a = self.to_dask_array().compute()
  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/dask/base.py", line 342, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/users/pmcguire/.local/lib/python3.10/site-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/users/pmcguire/for_andyh/cf/data/collapse/dask_collapse.py", line 957, in cf_sum_chunk
    raise ValueError(
ValueError: All collapse weights must be positive. Got a weight of 0.0. Consider replacing non-positive values with missing data.
mcguirepatr commented 8 months ago

And I am now wondering why CF is trying to use polygons for this dataset? The dataset has a regular N96 grid. Is it trying to use polygons because it doesn't have the lat & lon bounds defined for it? I have to try to figure out how to define them then. Patrick

davidhassell commented 8 months ago

Hi Patrick,

Thanks for reporting this. There are two things going on here:

  1. As you say, it is not possible to create an area weighted mean when there are no cell bounds, which is for converted.nc.
  2. The error message is unfortunately misleading - it should have said "Can't create weights: Unable to find cell areas". This is a bug that I shall fix.

You can add bounds and then everything will work (no new version of cf-python necessary, I think):

>>> import cf
>>> f = cf.read('converted.nc')[0]
>>> f.dump()
----------------------------------------------------------------------------------------------------
Field: long_name=Concentration of atmospheric CO2, expressed as a mass mixing ratio. (ncvar%co2_mmr)
----------------------------------------------------------------------------------------------------
_FillValue = -1e+20
long_name = 'Concentration of atmospheric CO2, expressed as a mass mixing
             ratio.'
missing_value = -1e+20
units = ''

Data(time(1), latitude(112), longitude(192)) = [[[--, ..., --]]]

Cell Method: time
Cell Method: : mean

Domain Axis: latitude(112)
Domain Axis: longitude(192)
Domain Axis: time(1)

Dimension coordinate: time
    calendar = '365_day'
    standard_name = 'time'
    units = 'seconds since 2010-01-01 00:00:00'
    Data(time(1)) = [2014-01-01 00:00:00] 365_day

Dimension coordinate: latitude
    standard_name = 'latitude'
    units = 'degrees_north'
    Data(latitude(112)) = [-55.625, ..., 83.125] degrees_north

Dimension coordinate: longitude
    standard_name = 'longitude'
    units = 'degrees_east'
    Data(longitude(192)) = [0.9375, ..., 359.0625] degrees_east

>>> # Add the bounds
>>> x = f.coord('X')
>>> x.set_bounds(x.create_bounds())
>>> y = f.coord('Y')
>>> y.set_bounds(y.create_bounds(max=90, min=-90))
>>> f.dump()
----------------------------------------------------------------------------------------------------
Field: long_name=Concentration of atmospheric CO2, expressed as a mass mixing ratio. (ncvar%co2_mmr)
----------------------------------------------------------------------------------------------------
_FillValue = -1e+20
long_name = 'Concentration of atmospheric CO2, expressed as a mass mixing
             ratio.'
missing_value = -1e+20
units = ''

Data(time(1), latitude(112), longitude(192)) = [[[--, ..., --]]]

Cell Method: time
Cell Method: : mean

Domain Axis: latitude(112)
Domain Axis: longitude(192)
Domain Axis: time(1)

Dimension coordinate: time
    calendar = '365_day'
    standard_name = 'time'
    units = 'seconds since 2010-01-01 00:00:00'
    Data(time(1)) = [2014-01-01 00:00:00] 365_day

Dimension coordinate: latitude
    standard_name = 'latitude'
    units = 'degrees_north'
    Data(latitude(112)) = [-55.625, ..., 83.125] degrees_north
    Bounds:units = 'degrees_north'
    Bounds:Data(latitude(112), 2) = [[-90.0, ..., 90.0]] degrees_north

Dimension coordinate: longitude
    standard_name = 'longitude'
    units = 'degrees_east'
    Data(longitude(192)) = [0.9375, ..., 359.0625] degrees_east
    Bounds:units = 'degrees_east'
    Bounds:Data(longitude(192), 2) = [[0.0, ..., 360.0]] degrees_east

>>> # Collapse the field
>>> a = f.collapse('mean', axes='area', weights='area')
>>> print(a)
Field: long_name=Concentration of atmospheric CO2, expressed as a mass mixing ratio. (ncvar%co2_mmr)
----------------------------------------------------------------------------------------------------
Data            : long_name=Concentration of atmospheric CO2, expressed as a mass mixing ratio.(time(1), latitude(1), longitude(1)) 
Cell methods    : time : mean area: mean
Dimension coords: time(1) = [2014-01-01 00:00:00] 365_day
                : latitude(1) = [0.0] degrees_north
                : longitude(1) = [180.0] degrees_east

>>> print(a.array)
[[[0.0005993416998535397]]]

Do let us know if that works for you!

mcguirepatr commented 8 months ago

Hi David: Yes, I created the lat/lon bounds. And it worked great!

See the attached plots that I made after doing this, for the global GPP vs. year (1860-2013) and for the global GPP vs. atmospheric CO2. I did this with a collapse('integral',axes='area',weights='area') to get the global spatially-summed GPP, and a collapse('maximum',axes='area',weights='area') to get the max value of the CO2 over the area. The CO2 should be the same everywhere each year.
(GPP=Gross Primary Productivity, in Petagrams of Carbon per year, which is a measure of planet growth and carbon assimilation). GPPvYear_v1 GPPvCO2_v1