bcdev / nc2zarr

A Python tool that converts NetCDF files to Zarr format
MIT License
9 stars 3 forks source link

Concatenate inputs along dim with missing coordinate #16

Closed forman closed 3 years ago

forman commented 3 years ago

A CCI Greenhouse Gases daily product is like

Dimensions:                (corners_dim: 4, layer_dim: 20, level_dim: 21, sounding_dim: 1692)
Dimensions without coordinates: corners_dim, layer_dim, level_dim, sounding_dim
Data variables:
    time                   (sounding_dim) datetime64[ns] ...
    latitude               (sounding_dim) float32 ...
    longitude              (sounding_dim) float32 ...
    solar_zenith_angle     (sounding_dim) float32 ...
    sensor_zenith_angle    (sounding_dim) float32 ...
    xco2                   (sounding_dim) float32 ...
    xco2_uncertainty       (sounding_dim) float32 ...
    xco2_quality_flag      (sounding_dim) int8 ...
    pressure_levels        (sounding_dim, level_dim) float32 ...
    co2_profile_apriori    (sounding_dim, layer_dim) float32 ...
    xco2_averaging_kernel  (sounding_dim, layer_dim) float32 ...
    pressure_weight        (sounding_dim, layer_dim) float32 ...
    orbit_number           (sounding_dim) int64 ...
    scene_number           (sounding_dim) int16 ...
    state_number           (sounding_dim) int16 ...
    latitude_corners       (sounding_dim, corners_dim) float32 ...
    longitude_corners      (sounding_dim, corners_dim) float32 ...
    altitude               (sounding_dim) float32 ...
    h2o_column             (sounding_dim) float32 ...
    surface_albedo_750nm   (sounding_dim) float32 ...
    surface_albedo_1560nm  (sounding_dim) float32 ...
Attributes:
    title:                     ESA CCI SCIAMACHY WFMD XCO2
    institution:               University of Bremen
    ...

We want to concatenate multiples of these files along the dimension sounding_dim but xarray complains, sounding_dim is not a coordinate. The time variable coud be used in this case as coordinate as it is monotonically increasing. The steps are (xarray v0.17):

files = glob.glob("*.nc")
datasets = [xr.open_dataset(f, decode_cf=False) for f in files]
datasets = [ds.swap_dims({"sounding_dim" : "time"}) for ds in datasets]  # turns time into a coordinate variable 
combined_dataset = xr.combine_by_coords(datasets, combine_attrs='override')
# adjust combined_dataset.attrs["time_coverage_end"] to combined_dataset.time[-1]

Ideally, we could implement such exotic rules by a custom pre-processor.