Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

Band filtering does not work #41

Closed tbanyai closed 4 years ago

tbanyai commented 4 years ago

Through openeo-client I am trying to run the following code:

dataCollection=openeo.connect(url)\ .load_collection('TERRASCOPE_S2_TOC_V2') \ .filter_temporal('2019-01-01', '2019-01-10') \ .filter_bbox(crs="EPSG:4326", **dict(zip(["west", "south", "east", "north"], bbox)))\ .filter_bands(["TOC-B02_10M","TOC-B04_10M","TOC-B08_10M"])\ .apply_dimension(utils.load_udf('udf_vito_save_to_public.py'),dimension='t',runtime="Python")\ .execute_batch("tmp/batchtest.json",job_options=job_options)

But I get the following exception:

DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): openeo-dev.vgt.vito.be:80 DEBUG:urllib3.connectionpool:http://openeo-dev.vgt.vito.be:80 "GET /openeo/0.4.0/ HTTP/1.1" 200 1721 DEBUG:urllib3.connectionpool:http://openeo-dev.vgt.vito.be:80 "GET /openeo/0.4.0/credentials/basic HTTP/1.1" 200 58 DEBUG:urllib3.connectionpool:http://openeo-dev.vgt.vito.be:80 "GET /openeo/0.4.0/collections/TERRASCOPE_S2_TOC_V2 HTTP/1.1" 200 2056 DEBUG:urllib3.connectionpool:http://openeo-dev.vgt.vito.be:80 "POST /openeo/0.4.0/result HTTP/1.1" 500 2229 Traceback (most recent call last): File "/home/banyait/eclipse-workspace/openeo-usecases/multisource_phenology_usecase/multisource_phenology_2_usecase.py", line 64, in .apply_dimension(utils.load_udf('udf_vito_save_to_public.py'),dimension='t',runtime="Python")\ File "/home/banyait/eclipse-workspace/openeo-python-client/openeo/rest/imagecollectionclient.py", line 1067, in execute return self.session.execute(newbuilder.processes) File "/home/banyait/eclipse-workspace/openeo-python-client/openeo/rest/connection.py", line 448, in execute return self.post(path="/result", json=req).json() File "/home/banyait/eclipse-workspace/openeo-python-client/openeo/rest/connection.py", line 134, in post return self.request("post", path=path, json=json, *kwargs) File "/home/banyait/eclipse-workspace/openeo-python-client/openeo/rest/connection.py", line 93, in request self._raise_api_error(resp) File "/home/banyait/eclipse-workspace/openeo-python-client/openeo/rest/connection.py", line 113, in _raise_api_error raise exception openeo.rest.connection.OpenEoApiError: [500] unknown: Traceback (most recent call last): File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1590661099118_0334/container_e4867_1590661099118_0334_01_000572/pyspark.zip/pyspark/worker.py", line 253, in main process() File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1590661099118_0334/container_e4867_1590661099118_0334_01_000572/pyspark.zip/pyspark/worker.py", line 248, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1590661099118_0334/container_e4867_1590661099118_0334_01_000572/pyspark.zip/pyspark/serializers.py", line 140, in dump_stream for obj in iterator: File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1590661099118_0334/container_e4867_1590661099118_0334_01_000572/pyspark.zip/pyspark/util.py", line 55, in wrapper return f(args, **kwargs) File "/data3/hadoop/yarn/local/usercache/openeo/appcache/application_1590661099118_0334/container_e4867_1590661099118_0334_01_000002/venv/lib64/python3.6/site-packages/openeogeotrellis/GeotrellisImageCollection.py", line 244, in tilefunction File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1590661099118_0334/container_e4867_1590661099118_0334_01_000572/venv/lib64/python3.6/site-packages/openeogeotrellis/GeotrellisImageCollection.py", line 212, in _tile_to_datacube the_array = xr.DataArray(bands_numpy, coords=coords,dims=dims,name="openEODataChunk") File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1590661099118_0334/container_e4867_1590661099118_0334_01_000572/venv/lib64/python3.6/site-packages/xarray/core/dataarray.py", line 281, in init coords, dims = _infer_coords_and_dims(data.shape, coords, dims) File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1590661099118_0334/container_e4867_1590661099118_0334_01_000572/venv/lib64/python3.6/site-packages/xarray/core/dataarray.py", line 104, in _infer_coords_and_dims 'coordinate %r' % (d, sizes[d], s, k)) ValueError: conflicting sizes for dimension 'bands': length 3 on the data but length 9 on coordinate 'bands'

I believe what happens is that the coordinate 'bands' (holding the band names) is not reduced to the filtered band names.

jdries commented 4 years ago

@tbanyai The band metadata of the datacube was not in line with datacube itself. I found and fixed the cause in load_collection, please confirm is this also fixes your issue.