Open-EO / FuseTS

Time series Fusion toolbox integrated with openEO
https://open-eo.github.io/FuseTS/
Other
22 stars 3 forks source link

Cannot execute MOGPR through openEO #79

Closed JanssenBrm closed 1 year ago

JanssenBrm commented 1 year ago

As part of WP5, we are setting up the different services as UDP's within openEO, allowing easy integration of the process within an existing workflow. To do this, I've created a separate branch where I added some code to publish the services as a UDP: https://github.com/Open-EO/FuseTS/tree/openeo_publish

To test the code, I've also created a corresponding notebook: https://github.com/Open-EO/FuseTS/blob/openeo_publish/notebooks/UDP/AI4FOOD%20-%20MOGPR.ipynb

However, when executing this notebook, I receive an error that seems to be related to the code in FuseTS:

image
jdries commented 1 year ago

Here the error is clearly that openEO provides a DataArray in the UDF, while the MOGPR implementation expects DataSet.

JanssenBrm commented 1 year ago

I was able to get one step further by correctly setting the typings for the fit_transform function and converting the openEO datacube to an xarray dataset in the UDF:

return XarrayDataCube(MOGPRTransformer().fit_transform(cube.get_array().to_dataset(dim='bands')))

However, I'm not sure if this is the best solution here. Anyway, this now results in a new error (j-8b2ada7a1e9242aea6bbf91f7e1fc15e):

image
jdries commented 1 year ago

importing plotting libs on a backend is generally something we try to avoid, GPy may allow that by setting library = none in a config file like: https://github.com/SheffieldML/GPy/blob/devel/GPy/defaults.cfg

The difficult bit is how to create this config: home = os.getenv('HOME') or os.getenv('USERPROFILE') or '' user_file = os.path.join(home,'.config','GPy', 'user.cfg')

JanssenBrm commented 1 year ago

Thanks for the tip! I was able to move one step further by setting the user config. However, now another error seems to be popping up.

image
jdries commented 1 year ago

Thanks for figuring out previous issue, helps a lot! For this one, it looks like GPy is receiving some unexpected input. The general way to debug this is to add udf debug logging to see what goes in when the call occurs, of course assuming that running it locally does work.

https://open-eo.github.io/openeo-python-client/udf.html#logging-from-the-udfs

A typical case is when openEO hits a patch of nodata, possibly outside of original image boundaries.

JanssenBrm commented 1 year ago

I noticed that the documentation also mentioned to use apply_neighborhood, but I was using the apply_dimension which isn't correct in the case for MOGPR. However, not I'm running the following issue (jobID: j-3d5e276c02ef47829bf356f5769035a5):

image

This error looks similar to https://github.com/Open-EO/openeo-geopyspark-driver/issues/434

jdries commented 1 year ago

@JanssenBrm the issue you referenced and encountered has been fixed. Can you try again?

JanssenBrm commented 1 year ago

Thanks for the feedback @jdries! Unfortunately, the error is still there when I test on dev and production. The following jobs where executed:

jdries commented 1 year ago

@JanssenBrm can you try again now on dev? Integration tests were failing for a couple of days, I suspect the fix wasn't deployed yet.

JanssenBrm commented 1 year ago

It seems to be working on the dev environment. Now I'm back on the GPy issue.

jdries commented 1 year ago

The udf now works, thanks @JanssenBrm for pushing this one.

Image