predict_random_forest should be a reducer

LukeWeidenwalker commented 2 years ago

Currently predict_random_forest takes the dimension over which to apply random_forest inference. As discussed in e.g. https://github.com/Open-EO/openeo-processes/issues/295#issuecomment-993393971, this isn't the intention of the spec, rather predict_random_forest should be implemented as a reducer that can be called across any dimension.

ValentinaHutter commented 2 years ago

the workflow for reduce dimension is now updated in order to also use predict_random_forest inside of reduce_dimension. The reduce_dimension has got a parameter context. In order to use reduce_dimension properly, we need the context (dict) that specifies the model used in the prediction and the predictors_vars.

ValentinaHutter commented 2 years ago

As discribed above, I now tried to use the reduce_dimension process like this:

rf_regr_model = PGNode("load_ml_model",{"model":"jb-22d7ad56-30bd-416a-9221-bb191c1e8a7c"})

boa_sentinel_2_cube = conn.load_collection( collection_id = "boa_sentinel_2", spatial_extent = {"west":10.454955, "east":10.537297, "south":46.102185, "north":46.123657}, temporal_extent = ["2018-05-01", "2018-05-10"], bands = ["B02","B03","B04","B08"] )

reduced_prediction = boa_sentinel_2_cube.reduce_dimension(reducer="predict_random_forest", dimension="bands", context={"model": rf_regr_model, "predictors_vars": ["B02", "B03", "B04", "B08"]})

prediction_netcdf = reduced_prediction.save_result(format="netCDF")

when I try to send this with

job = prediction_netcdf.create_job(title="UC8_predict_rf_regr")

I get the following error message:

TypeError Traceback (most recent call last) /tmp/ipykernel_10364/1345881284.py in 12 prediction_netcdf = reduced_prediction.save_result(format="netCDF") 13 ---> 14 job = prediction_netcdf.create_job(title="UC8_predict_rf_regr")

~/python/SRR1_notebooks/vrtlnvrnmnt/lib/python3.8/site-packages/openeo/rest/datacube.py in create_job(self, out_format, title, description, plan, budget, job_options, **format_options) 1641 # add save_result node 1642 img = img.save_result(format=out_format, options=format_options) -> 1643 return self._connection.create_job( 1644 process_graph=img.flat_graph(), 1645 title=title, description=description, plan=plan, budget=budget, additional=job_options

~/python/SRR1_notebooks/vrtlnvrnmnt/lib/python3.8/site-packages/openeo/rest/connection.py in create_job(self, process_graph, title, description, plan, budget, additional) 1069 req["job_options"] = additional 1070 -> 1071 response = self.post("/jobs", json=req, expected_status=201) 1072 1073 if "openeo-identifier" in response.headers:

~/python/SRR1_notebooks/vrtlnvrnmnt/lib/python3.8/site-packages/openeo/rest/connection.py in post(self, path, json, kwargs) 171 :return: response: Response 172 """ --> 173 return self.request("post", path=path, json=json, allow_redirects=False, kwargs) 174 175 def delete(self, path, **kwargs) -> Response:

~/python/SRR1_notebooks/vrtlnvrnmnt/lib/python3.8/site-packages/openeo/rest/connection.py in request(self, method, path, headers, auth, check_error, expected_status, **kwargs) 100 ) 101 with ContextTimer() as timer: --> 102 resp = self.session.request( 103 method=method, 104 url=url,

~/python/SRR1_notebooks/vrtlnvrnmnt/lib/python3.8/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 526 hooks=hooks, 527 ) --> 528 prep = self.prepare_request(req) 529 530 proxies = proxies or {}

~/python/SRR1_notebooks/vrtlnvrnmnt/lib/python3.8/site-packages/requests/sessions.py in prepare_request(self, request) 454 455 p = PreparedRequest() --> 456 p.prepare( 457 method=request.method.upper(), 458 url=request.url,

~/python/SRR1_notebooks/vrtlnvrnmnt/lib/python3.8/site-packages/requests/models.py in prepare(self, method, url, headers, files, data, params, auth, cookies, hooks, json) 317 self.prepare_headers(headers) 318 self.prepare_cookies(cookies) --> 319 self.prepare_body(data, files, json) 320 self.prepare_auth(auth, url) 321

~/python/SRR1_notebooks/vrtlnvrnmnt/lib/python3.8/site-packages/requests/models.py in prepare_body(self, data, files, json) 469 470 try: --> 471 body = complexjson.dumps(json, allow_nan=False) 472 except ValueError as ve: 473 raise InvalidJSONError(ve, request=self)

/usr/lib/python3.8/json/init.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 232 if cls is None: 233 cls = JSONEncoder --> 234 return cls( 235 skipkeys=skipkeys, ensure_ascii=ensure_ascii, 236 check_circular=check_circular, allow_nan=allow_nan, indent=indent,

/usr/lib/python3.8/json/encoder.py in encode(self, o) 197 # exceptions aren't as detailed. The list call should be roughly 198 # equivalent to the PySequence_Fast that ''.join() would do. --> 199 chunks = self.iterencode(o, _one_shot=True) 200 if not isinstance(chunks, (list, tuple)): 201 chunks = list(chunks)

/usr/lib/python3.8/json/encoder.py in iterencode(self, o, _one_shot) 255 self.key_separator, self.item_separator, self.sort_keys, 256 self.skipkeys, _one_shot) --> 257 return _iterencode(o, 0) 258 259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/usr/lib/python3.8/json/encoder.py in default(self, o) 177 178 """ --> 179 raise TypeError(f'Object of type {o.class.name} ' 180 f'is not JSON serializable') 181

TypeError: Object of type PGNode is not JSON serializable

ValentinaHutter commented 2 years ago

Am I allowed to use a reference like rf_regr_model inside of the context of reduce_dimension? @soxofaan have you maybe seen this before? If not, maybe you could give me a hint on whom to ask about this :)

jdries commented 2 years ago

Why not using the more simple example that I sent:

predicted = boa_sentinel_2_cube.predict_random_forest(
   model="jb-22d7ad56-30bd-416a-9221-bb191c1e8a7c",
   dimension="bands"
)

If that doesn't work, check out the implementation openeo-python-client/openeo/rest/datacube.py:1814 In any case, avoid using PGNode directly, unless if you really know what you're doing. (It's not the kind of code you want to have ending up in a demonstration.)

soxofaan commented 2 years ago

Indeed a regular user should not have to create PGNode objects themself, if not that's a bug or missing feature in the python client.

In version 0.10.0 Connection.load_ml_model() was added for your use case (https://open-eo.github.io/openeo-python-client/api.html?highlight=load_ml_model#openeo.rest.connection.Connection.load_ml_model)

A couple of notes about

cube.reduce_dimension(reducer="predict_random_forest", dimension="bands", context={"model": rf_regr_model, "predictors_vars": ["B02", "B03", "B04", "B08"]})

The python client defines DataCube.predict_random_forest , which is indeed defined on the level of a data cube, but internally translates that to a reduce_dimension with predict_random_forest reducer
passing the model as a subfield of context argument technically can't work yet as discussed at https://github.com/openEOPlatform/architecture-docs/issues/213 : there is no openeo process (yet) to "get" the model again out of the context parameter. In the VITO implementation we do context=model (which means it's impossible to inject additional context data, like the "predictors_vars" in your use case). Another alternative is putting model and predictor_vars in an array and pass that to context and extracting them again with array_element.

ValentinaHutter commented 2 years ago

Thank you very much for the quick replies! I will see how I can update our implementation then, so that I can use the context=model properly and avoid using PGNode and the predictors_vars.

Open-EO / openeo-processes-python

predict_random_forest should be a reducer #154