csiro-coasts / emsarray

xarray extension that supports EMS model formats
BSD 3-Clause "New" or "Revised" License
13 stars 2 forks source link

Point extraction #13

Closed mx-moth closed 1 year ago

mx-moth commented 2 years ago

Use case

Given a dataset covering a wide area with multiple timesteps and depth layers, extract data at a selection of lat/lon points supplied in a CSV.

Example CSV:

lon,lat,name
153.39,-25.80,A1
153.01,-25.79,A2
152.92,-25.61,A3
152.96,-25.41,B1
152.93,-25.19,B2
152.93,-25.19,B3
152.96,-24.64,C1
153.00,-24.47,C2

Extract the locations from sites.csv from dataset.nc and save the point data to sites.nc:

$ emsarray extract-points dataset.nc sites.csv sites.nc

Implementation details

Add a new method Format.extract_points():

def extract_points(
    self,
    points: List[Point],
    new_dimension: str = 'point',
) -> xr.Dataset:
    """
    Select data at the given points, discarding all other locations.
    Any variables that do not use any of the surface dimensions are dropped.
    Any horizontal topology information for this dataset is dropped.
    Coordinate variables are kept.

    Parameters
    ----------
    points : list of shapely.geometry.Point
        The points to extract
    new_dimension : str
        The name of the new dimension used to index the points.

    Returns
    -------
    xarray.Dataset
        A new dataset with data for the given points.
    """

Add a new operation extract_points() that calls the above method and populates the dataset with extra fields from the CSV:

def extract_points(
    dataset: emsarray.format.Format,
    points: pandas.DataFrame,
    longitude: str = 'lon',
    latitude: str = 'lat',
) -> xr.Dataset:
    """
    Extract point data from the dataset and combine it with the dataframe
    to make a new dataset.
    The points from the dataframe will be extracted from the dataset
    using `Format.extract_points()`.
    The columns from the dataframe will be appended to the dataset.
    """

Add a new command line entry point extract-points which opens the dataset using xarray, opens the csv using pandas, passes these to the operation, and saves the data to the output file.

/cc @frizwi