Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
49 stars 15 forks source link

fit_*_random_forest: Drop parameter max_variables? #358

Open m-mohr opened 2 years ago

m-mohr commented 2 years ago

as touched in the dev telco today: it could also be an option to leave mtry/max_variables out of the spec for now (backends will be able to pick a good enough default behavior), and just introduce it when we are sure about how we are going to handle it

_Originally posted by @soxofaan in https://github.com/Open-EO/openeo-processes/pull/351#discussion_r833482314_

jdries commented 2 years ago

+1 for this one Internally, we've even reached the conclusion that random forest is nice for textbook examples, but for a lot of real world use cases, new types of ML are used like boosters. Hence, investing a lot of time in getting parameters right might not make sense.

m-mohr commented 2 years ago

On the other hand, I don't see why the current definition is problematic. Back-ends can now choose which options they allow by adapting the schema/enum.

soxofaan commented 2 years ago

Back-ends can now choose which options they allow by adapting the schema/enum.

We should avoid that, I think. It's bad for the user experience if there is divergence between the process docs at openeo.org, the docs of the python client (online or in the user's IDE) and the process definitions at openeo.vito.be. It's not only bad for the user experience, but it makes the merging at the level of aggregator/federation also a lot harder.

m-mohr commented 2 years ago

We have that in many places because it just doesn't work otherwise with the many underlying implementations. That's why we have the JSON Schemas in the first place, so that clients can actually adapt to it. Ideally, you'd implement everything, but that's not realistic (see recent issues about max_variables, ard processes, aggregate_temporal_period etc).

Just search for "enum" in openeo-processes and you'll already find over 20 processes with a customizable enum. And this doesn't even include the "options" objects in some processes.

The Python client diverges by design so I don't buy that argument.