OpenSenseAction / poligrain

Simplify common tasks for working with point, line and gridded sensor data, focusing on rainfall observations.
https://poligrain.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2 stars 10 forks source link

Add functionality to select spatial subsets #35

Open nblettner opened 3 months ago

nblettner commented 3 months ago

Selecting spatial subsets is a common task but a little cumbersome to implement particularly for links which include two sites. The idea is a function that takes a dataset or dataarray of sensor / gridded data and the coordinates of a bounding box, and returns a dataset or dataarray with only sensors / pixels in that box. Alternatively, a boolean dataarray could be returned that stores the information whether or not a sensor / pixel is within the bounding box.

Below is a prototype. This function addresses all kinds of data (point, link, grid), which needs to be specified via an argument. An alternative would be three (or two) separate functions. The prototype also distinguishes several modes of what it means for a link to be considered within the bounding box; this might not be necessary though. Any comment if and how such a function should be implemented is appreciated.

def select_spatial_subset(
    ds, lonmin, lonmax, latmin, latmax, sensortype="point", include="full"
):
    """Select / cut a spatial subset defined by its bounding coordinates.

    Parameters
    ----------
    ds: xr.DataArray | xr.Dataset
        Dataset or Dataarray with coordinates `lon` and `lat`
    lonmin, lonmax, latmin, latmax: float
        Minimum / Maximum longitude / latitude each defining one of the
        four boundaries of the subset.
    sensortype: str 'point, link, grid'
        Wich kind of data.
    include: str 'full', 'part', 'mid'
        Defines mode of selection:
        'full': both start and end points) within the domain;
        'part' at least one pole within the domain;
        has no effect if 'sensortype' is not 'link'

    Returns
    -------
    xr.DataArray | xr.Dataset
        Reduced dataset or dataarray
    """

    # for links (CMLs / SMLs)
    if sensortype == "link":
        # consider midpoints
        if include == "mid":
            # calculate midpoints
            lon_mid = (ds.site_0_lon + ds.site_1_lon) / 2
            lat_mid = (ds.site_0_lat + ds.site_1_lat) / 2

            # label sensors
            cond_mid_lon = (lon_mid <= lonmax) & (lon_mid >= lonmin)
            cond_mid_lat = (lat_mid <= latmax) & (lat_mid >= latmin)
            cond = cond_mid_lon & cond_mid_lat

        # consider start and end points
        else:
            # label sensors
            cond_0_lon = (ds.site_0_lon <= lonmax) & (ds.site_0_lon >= lonmin)
            cond_1_lon = (ds.site_1_lon <= lonmax) & (ds.site_1_lon >= lonmin)
            cond_0_lat = (ds.site_0_lat <= latmax) & (ds.site_0_lat >= latmin)
            cond_1_lat = (ds.site_1_lat <= latmax) & (ds.site_1_lat >= latmin)

            # distinguish whether both start and end point
            # or only one of them needs to be inside
            if include == "full":
                cond = cond_0_lon & cond_1_lon & cond_0_lat & cond_1_lat
            elif include == "part":
                cond = (cond_0_lon & cond_0_lat) | (cond_1_lon & cond_1_lat)
            else:
                raise ValueError()

    # for point-like and gridded data
    elif sensortype in ["point", "grid"]:
        # label sensors / pixels
        cond_lon = (ds.lon <= lonmax) & (ds.lon >= lonmin)
        cond_lat = (ds.lat <= latmax) & (ds.lat >= latmin)
        cond = cond_lon & cond_lat

    else:
        raise ValueError()

    # apply selection
    return ds.where(cond, drop=True)
    # alternatively: return cond
cchwala commented 3 months ago

Thanks @nblettner for the prototype. I think this would be a nice addition.

It could also be added as an xarray Accessor, as is done in #29 for the CML plotting functions. Maybe, to simplify the usage, the sensorstype could be inferred (e.g. via default kwarg sensortype=infer, but maybe it could be omitted completely) because there should either be lon and lat or site_0_lon and site_0_lat.

One small additional comment: I would prefer sensor_type over sensortype, and maybe also lat_max over latmax. That helps readability.