TUW-GEO / ascat

Read and visualize data from the Advanced Scatterometer (ASCAT) on-board the series of Metop satellites
https://ascat.readthedocs.io/
MIT License
23 stars 16 forks source link

Add read/write to concat-compatible xarray format #54

Closed claytharrison closed 7 months ago

claytharrison commented 11 months ago

This adds methods to CRANcFile and IRANcFile to read in netcdf time series files from Indexed Ragged Array or Contiguous Ragged Array format into a compatible Indexed Ragged Array format that is easy to concatenate to other time series from the same grid cell, and a writer that writes these compatible arrays to netcdf time series in Contiguous Ragged Array format.

This is achieved by filling out the coordinates and variables in the locations dimension with data for all location_ids in the relevant cell, sorting that dimension by location_id, and then updating the locationIndex for all observations to align with the new ordering. In this way, a given locationIndex refers to the same location_id for any time series produced in a given cell, and they can be concatenated safely.

When writing to Contiguous Ragged Array format, observations are ordered by locationIndex and by time, a row_size variable is calculated from the locationIndex, locationIndex is dropped, and appropriate attributes and encoding are set.

Some questions that still remain:

claytharrison commented 11 months ago

Restructured ragged_array_ts.py to use functions for merging indexed/contiguous datasets together from source files, rather than using class methods on CRANcFile and IRANcFile to create a mergeable "compatible" format file.

merge_netCDFs() takes a list of filenames/paths as an argument, does its magic to merge them along the observation dimension, and returns the merged dataset, which has been deduplicated and sorted by time. It returns a contiguous array by default, but you can optionally set out_format to "indexed" to override this. The time window for detecting duplicate values, dupe_window, is None by default and set to np.timedelta64(10, "m") within the function if None.

There are helper functions for converting contiguous to indexed ragged arrays and vice-versa, set_attributes() to help set reasonable output attributes on the dataset which merge_netCDFs returns, and create_encoding() to create a reasonable encoding dictionary to pass to .to_netcdf() when writing the dataset to file. Both of those will take a user-created attribute/encoding dictionary as an argument, and override the default values for any given key with those passed by the user.

Things to do yet: