Closed kvantricht closed 2 months ago
@VictorVerhaert, according to Hans you would already have an inference UDF notebook for grassland watch. Would you be able to share it in a PR so @GriffinBabe can have a look at it?
Yes I'll add it to the examples on the github. If you want (and it fits in our next sprint) I could also take a look at creating an as minimal as possible example notebook.
My inference notebook does not use GFMap however. I use a shared .py file containing the preprocessing steps. My extraction pipeline (GFmap) uses this .py file after the fetchers, but my inference pipeline just uses load_collection.
for now I would just suggest putting this example in https://github.com/Open-EO/openeo-community-examples and referencing it here
FYI you can inspect my pipelines here: https://github.com/gisat/grasslandwatch/tree/main/lc_offline
My inference notebook does not use GFMap however.
Ah ok interesting. Definitely useful but we should also work on a GFMAP-based inference workflow here.
I assume the functionality of GFMap for inference would mainly be to split up the spatial extent that we want to perform inference on as well as job managing right?
GFMAP standardizes band names across backends, lays out typical data flow paths, takes care of loading collections and rescaling them into the most efficient datatype, applies collection-specific standardized processes, etc. That goes much broader than just the job splitting concept.
Yes of course, I meant what would be visible in the example notebook and what to focus on in the explanation. It might indeed be good to emphasize that using the same pipeline for extraction and inference is crucial for having accurate results due to the optimalisations you mention in the background.
@VictorVerhaert one thing about the extraction pipeline:
The S1 bands are scaled in uint16 in the following code block (in the fetching preprocessing) https://github.com/Open-EO/openeo-gfmap/blob/main/src/openeo_gfmap/fetching/s1.py#L132 This is a memory optimization for OpenEO, as the collections are in float32 power vals. Those values are automatically reconverted to decibels in the feature extractor, unless the users disables it with a flag: https://github.com/Open-EO/openeo-gfmap/blob/main/src/openeo_gfmap/features/feature_extractor.py#L110. Now I see here that you perform some operations on compositing, so probably we should do that rescaling after preprocessing and before entering in the FeatureExtractor.
@kvantricht @VictorVerhaert
I like the idea of using the common ONNX library. I see online that it is possible to convert any Sklearn, PyTorch and Tensorflow model to that format. Even catboost is directly compatible.
Based on the inference UDF of @VictorVerhaert and the Feature Extractor functionalities already implemented in GFMAP I came with this first idea for a Model Inference base class, that an user can override to implement its own model inference pipeline. Please take a look and tell me what do you think: https://github.com/Open-EO/openeo-gfmap/blob/a7b0cd7ff05e0de73460776fb148a31d8a0167f4/src/openeo_gfmap/inference/model_inference.py
We could very well also provide a Model Inference default implementation that requires an path to download the ONNX model and the name of the input tensor as unique parameters, and that returns the probability values or directly the max_probability argument.
One thing that needs to be taken care of by the user is the dependency of ONNX within the OpenEO job. On the long term this could be directly included in the default OpenEO UDF environment, but so far we need to specify the .zip file in the udf-dependency-archives
parameters at the end of the job creation, which is done mannualy at the moment. Maybe that's something to discuss in the redesign dicussion @VincentVerelst
on this last point: @HansVRP and I had a similar discussion this morning. I think that in the long run the onnxruntime should be included in the standard udf env, as we are advising different projects to use onnx models.
Closed by #88
We need a minimal example showing how external projects can make use of OpenEO-GFMAP functionality for inference purposes: