NCAS-CMS / cfdm

A Python reference implementation of the CF data model
http://ncas-cms.github.io/cfdm
MIT License
28 stars 11 forks source link

Allow discrete sampling geometries with 1-d data to be written as ragged arrays, and improve the compression process #287

Closed davidhassell closed 7 months ago

davidhassell commented 7 months ago

Currently, a 1-d DSG can not be compressed so that it is written out to netCDF file as a ragged array. E.g.

>>> print(dsg)
Field: mole_fraction_of_ozone_in_air (ncvar%O3_TECO)
----------------------------------------------------
Data            : mole_fraction_of_ozone_in_air(ncdim%obs(11160)) ppb
Auxiliary coords: time(ncdim%obs(11160)) = [2017-07-03 11:15:07, ..., 2017-07-03 14:21:06] standard
                : altitude(ncdim%obs(11160)) = [2577.927001953125, ..., 151.16905212402344] m
                : air_pressure(ncdim%obs(11160)) = [751.6758422851562, ..., 1006.53076171875] hPa
                : latitude(ncdim%obs(11160)) = [52.56147766113281, ..., 52.0729866027832] degree_north
                : longitude(ncdim%obs(11160)) = [0.3171832859516144, ..., -0.6249311566352844] degree_east
                : cf_role=trajectory_id(cf_role=trajectory_id(1)) = [STANCO]

This can be solved be making it possible to insert the cf_role=trajectory_id dimension into the data and appropriate metadata constructs, so it would look like (note that the cf_role=trajectory_id construct remains unchanged):

Field: mole_fraction_of_ozone_in_air (ncvar%O3_TECO)
----------------------------------------------------
Data            : mole_fraction_of_ozone_in_air(cf_role=trajectory_id(1), ncdim%obs(11160)) ppb
Auxiliary coords: time(cf_role=trajectory_id(1), ncdim%obs(11160)) = [[2017-07-03 11:15:07, ..., 2017-07-03 14:21:06]] standard
                : altitude(cf_role=trajectory_id(1), ncdim%obs(11160)) = [[2577.927001953125, ..., 151.16905212402344]] m
                : air_pressure(cf_role=trajectory_id(1), ncdim%obs(11160)) = [[751.6758422851562, ..., 1006.53076171875]] hPa
                : latitude(cf_role=trajectory_id(1), ncdim%obs(11160)) = [[52.56147766113281, ..., 52.0729866027832]] degree_north
                : longitude(cf_role=trajectory_id(1), ncdim%obs(11160)) = [[0.3171832859516144, ..., -0.6249311566352844]] degree_east
                : cf_role=trajectory_id(cf_role=trajectory_id(1)) = [STANCO]

This can be done be add a constructs keyword to cf.Field.insert_dimension that works in the same was as the same keyword on cf.Field.transpose.

Edit: To be clear, this is about allowing a manipulation that turns a 1-d DSG into a 2-d one!

Whilst we're at it, the compression process in cf.Field.compress could be improved, to avoid the following situation: If the data contains trailing missing values at positions where there are non-missing coordinate values, then those non-missing coordinate values are currently lost.

PR to follow.