Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

Appropriate use of `last` function #90

Open fdetsch opened 3 years ago

fdetsch commented 3 years ago

I need to reduce an NDVI datacube along the time dimension in a way to keep the last valid (ie. non-null) pixel value. last seemed promising, but I can't get it to work using the following code:

import openeo

con = openeo.connect(
    'https://openeo.vito.be'
).authenticate_basic(
    'test'
    , 'test123'
)

# make dictionary, containing bounding box
urk = {"west": 5.5661, "south": 52.6457, "east": 5.7298, "north": 52.7335}

# make list, containing the temporal interval
t = ["2021-04-26", "2021-04-30"]

# load datacube
datacube = con.load_collection(
    "SENTINEL2_L2A_SENTINELHUB"
    , spatial_extent = urk
    , temporal_extent = t
    , bands = ["B04", "B08"]
)

# calculate ndvi
ndvicube = datacube.ndvi(
    nir = 'B08'
    , red = 'B04'
    , target_band = 'ndvi'
)

# get last valid value per pixel
lastcube = ndvicube.reduce_dimension(
    dimension = 't'
    , reducer = 'last'
)

lastcube.download(
    'last.ncdf'
    , format = 'NetCDF'
)

This always raises an error:

OpenEoApiError: [400] unknown: Unsupported operation: last (arguments: [data])

I also tried to wrap this in an UDF (get_last_value.py)

import xarray
from openeo.udf import XarrayDataCube

def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    """
    Get the last value of a timeseries datacube on a per-pixel basis.
    This UDF preserves dimensionality, and assumes an input
    datacube with a temporal dimension 't' as input.
    """
    array: xarray.DataArray = cube.get_array()
    tail_array = last(array.values)
    return XarrayDataCube(
        array=xarray.DataArray(tail_array, dims=array.dims, coords=array.coords)
    )

that is then integrated with

    ## get last value per pixel
    tail_udf = load_udf('/get_last_value.py')

    tailcube = indexcube.apply_dimension(
        code = tail_udf
        , runtime = 'Python'
    )

However, the downloaded result then is a multi-layered (in this case nlayers = 2) datacube, which means no reduction whatsoever takes place.

Is this a backend issue or am I applying last in an unintended way here?

m-mohr commented 3 years ago

This looks like a back-end issue to me, at least the processes are used correctly. I've also tried it in the Web Editor and it also fails there with the same error. It also fails for first, but it works for mean.

@soxofaan Can you also confirm this is a back-end issue and if so, move this over to the corresponding repo?

apply_dimension is not meant to reduce so that's why the UDF example is still multi-layered, I think.

fdetsch commented 3 years ago

Thanks for clarifying on apply_dimension().

@soxofaan Any help on this would be highly appreciated.

soxofaan commented 3 years ago

The VITO backend https://openeo.vito.be indeed does not support the last or first processes in reduce_dimension at the moment

soxofaan commented 3 years ago

The approach with the UDF is a good workaround currently, but your function definition is a bit off from what the VITO backend expects and currently the VITO backend will unfortunately silently skip invalid UDF code.

The VITO backend expects a function called apply_datacube that accepts and returns a cube of the type openeo_udf.api.datacube.DataCube (instead of openeo.udf.XarrayDataCube). Also your UDF uses a function last that is undefined.

So it should be something like this:

udf = '''
import xarray
from openeo_udf.api.datacube import DataCube

def apply_datacube(cube: DataCube, context: dict) -> DataCube:
    array: xarray.DataArray = cube.get_array()
    last = array.tail({"t": 1})
    return DataCube(last)
'''

lastcube = ndvicube.apply_dimension(code=udf, runtime='Python')

lastcube.download('last.ncdf', format='NetCDF')

This downloads a single time layer netcdf for me

I also tried to get this working with reduce_dimension because that would indeed be the more logical choice, but there is another issue popping up there (to be taken elsewhere)

So to be clear: the fact that you got two layers in your output was not due to using apply_dimension, but because apply_dimension was not doing anything due to invalid function signature