Open-EO / openeo-processes-dask

Python implementations of many OpenEO processes, dask-friendly by default.
Apache License 2.0
19 stars 14 forks source link

#237 aggregate_spatial needs to translate incoming geometries to same CRS as the data #238

Closed danielFlemstrom closed 4 months ago

danielFlemstrom commented 6 months ago

If the incoming geometries have a different crs, we now "translate" this to the same CRS as the data (since the xvec function assumes same crs without checking it).

I have added the line that i think should fix this. Please check that its correct and does not have any side-effects that I was unaware of.

ValentinaHutter commented 5 months ago

Thanks for adding this! Would be great if the change could also be test - in this case, tests live here: https://github.com/Open-EO/openeo-processes-dask/blob/main/tests/test_aggregate.py#L100

I think this might also be a nice addition to the xvec code - @masawdah does it make sense to you, to add this in xvec directly and update the definition there?

ValentinaHutter commented 5 months ago

A simple test before the geometry_url test (https://github.com/Open-EO/openeo-processes-dask/blob/main/tests/test_aggregate.py#L150) could look like this:

    gdf = gpd.GeoDataFrame.from_features(polygon_geometry_small, crs="EPSG:4326")
    gdf_equi7 = gdf.to_crs(
        "+proj=aeqd +lat_0=53 +lon_0=24 +x_0=5837287.81977 +y_0=2121415.69617 +datum=WGS84 +units=m +no_defs"
    )
    output_cube_transform = aggregate_spatial(
        data=reduced_cube, geometries=gdf_equi7, reducer=reducer
    )
    assert len(output_cube_transform.dims) == len(output_cube.dims)
    assert output_cube_transform.shape == output_cube.shape
ValentinaHutter commented 5 months ago

I opened a PR on your fork to merge the latest updates of openeo-processes-dask into your branch.

clausmichele commented 5 months ago

@ValentinaHutter trying to create an example using aggregate_spatial I also faced a related bug. pydantic checks the CRS here: https://github.com/Open-EO/openeo-pg-parser-networkx/blob/50e71097929086beca7377ac6d909831a2ecb87c/openeo_pg_parser_networkx/pg_schema.py#L112

which is not compatible of what we get with a geoJSON string, which is a dictionary.

Here is a code to reproduce the issue:

import json
import geopandas as gpd
from openeo.local import LocalConnection
local_conn = LocalConnection("./")

url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
spatial_extent =  {"east": 11.40, "north": 46.52, "south": 46.46, "west": 11.25}
temporal_extent = ["2022-06-01", "2022-06-30"]
bands = ["red", "nir"]
properties = {"eo:cloud_cover": dict(lt=80)}
s2_datacube = local_conn.load_stac(
    url=url,
    spatial_extent=spatial_extent,
    temporal_extent=temporal_extent,
    bands=bands,
    properties=properties,
)

b04 = s2_datacube.band("red")
b08 = s2_datacube.band("nir")
ndvi = (b08 - b04) / (b08 + b04)

sample_geojson_string = """
{"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.348766682377743,46.48077235403869],[11.348766682377743,46.47721029746353],[11.35563673107842,46.47721029746353],[11.35563673107842,46.48077235403869],[11.348766682377743,46.48077235403869]]],"type":"Polygon"}},{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.365497271567875,46.48683844486996],[11.362870488240105,46.48441765486314],[11.365658919772045,46.48208023815508],[11.3688514718161,46.48305417398808],[11.367477462075641,46.485725451348884],[11.365497271567875,46.48683844486996]]],"type":"Polygon"}},{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.372809654880939,46.486050715872665],[11.371383328810936,46.484121615720795],[11.375203845069393,46.482332752634534],[11.37754709654618,46.48349025889712],[11.3788206018977,46.48548953006227],[11.375407607374086,46.486226085126646],[11.372809654880939,46.486050715872665]]],"type":"Polygon"}},{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.358546395621374,46.47956165241811],[11.358240754320718,46.478158509926004],[11.363283835883294,46.47563276206486],[11.365219564109765,46.47864961374469],[11.362723493271574,46.481245375637826],[11.358546395621374,46.47956165241811]]],"type":"Polygon"}},{"type":"Feature","properties":{},"geometry":{"coordinates":[[[11.35324861290701,46.48243798140945],[11.359004857403477,46.48170137505551],[11.360278362822669,46.483244729307046],[11.359157678053293,46.48527908392737],[11.355082460711827,46.48510371152537],[11.35324861290701,46.48243798140945]]],"type":"Polygon"}}]}
"""

sample_geojson = json.loads(sample_geojson_string)
#The data is in EPSG:32632, we need to convert the geoJSON in EPSG:4326 to that
geoms = gpd.GeoDataFrame.from_features(sample_geojson).set_crs("EPSG:4326").to_crs("EPSG:32632")

# Doesn't work due to Pydantic (AttributeError: 'dict' object has no attribute 'strip')
geojson_32632_dict = json.loads(geoms.to_json())
ndvi_stats = ndvi.aggregate_spatial(geometries = geojson_32632_dict, reducer = "median").execute()

# Doesn't work due to openeo-processes-dask, which expects a dict
geojson_32632_dict["crs"] = 32632
ndvi_stats = ndvi.aggregate_spatial(geometries = geojson_32632_dict, reducer = "median").execute()
danielFlemstrom commented 5 months ago

Hmm, sadly i discovered this : "The support for specifying Coordinate Reference Systems (CRS) other than WGS 84 was removed from the GeoJSON specification in its 2016 version, RFC 7946, which mandates that all coordinates are assumed to be in the WGS 84 system. For applications needing a different CRS, transformations must occur outside the GeoJSON format, as it exclusively assumes WGS 84 coordinates."

As i understand it, this meanst that we cannot specify (other) CRS:s for polygons, only for the bounding box ("west", ....). However, since the rastercube still may be a of different CRS, i guess we still need to convert the polygon to the CRS of the rastercube (which should be possible once in geopandas).

clausmichele commented 5 months ago

Hmm, sadly i discovered this : "The support for specifying Coordinate Reference Systems (CRS) other than WGS 84 was removed from the GeoJSON specification in its 2016 version, RFC 7946, which mandates that all coordinates are assumed to be in the WGS 84 system. For applications needing a different CRS, transformations must occur outside the GeoJSON format, as it exclusively assumes WGS 84 coordinates."

As i understand it, this meanst that we cannot specify (other) CRS:s for polygons, only for the bounding box ("west", ....). However, since the rastercube still may be a of different CRS, i guess we still need to convert the polygon to the CRS of the rastercube (which should be possible once in geopandas).

Exactly. geoJSON doesn't support other CRSs than EPSG:4326. The action points here are:

  1. Convert the CRS to the native of the datacube, like we did in the filter_spatial implementation here: https://github.com/Open-EO/openeo-processes-dask/blob/main/openeo_processes_dask/process_implementations/cubes/_filter.py#L122-L128
  2. Implement a process to load vector cubes from other files (like GeoParquet as mention by @soxofaan)
soxofaan commented 5 months ago
  1. Implement a process to load vector cubes from other files (like GeoParquet as mention by @soxofaan)

FYI: the current approach we use is load_url with a geoparquet file (which supports non-EPSG4326 CRSes). Drawback of that approach is that it requires you to host your geometry on a publicly available URL, which is more cumbersome than just having inline GeoJSON. As a workaround I propose in https://github.com/Open-EO/openeo-processes/issues/498 to also support data URLs in load_url, allowing to embed binary data directly in process graphs, based on a well-established standard. Feel free to weigh in on the discussion there.