Machine Learning for openEO

Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.

https://processes.openeo.org

Apache License 2.0

49 stars 15 forks source link

Machine Learning for openEO #441

Open m-mohr opened 1 year ago

m-mohr commented 1 year ago

Adds the basic ML processes again #416 (supersedes #418)
Generalizes the predict functionalities #368 (supersedes #396)
...

Potentially interesting for "bring your own model": https://onnx.ai/

m-mohr commented 1 year ago

Variants discussed in the ML meeting:

datacube = load_collection("s2", temporal_extent = ..., spatial_extent = ..., bands = ...)
model = load_ml_model("my-model-id")

# variant 1
function fn(data) {
    return predict_ml_model(data, model)
}
datacube2 = reduce_dimension(datacube, dimension = "bands", reducer = fn)

function fn2(data) {
    return predict_ml_model_probabilities(data, model)
}
datacube3 = apply_dimension(datacube, dimension = "bands", target_dimension = "probabilities", process = fn2)

# variant 2
datacube2 = predict_ml_model(datacube, dimension = "bands", model = model)
datacube3 = predict_ml_model_probabilities(datacube, dimension = "bands", model = model)

m-mohr commented 1 year ago

Some things from the ML meeting:

Regularization may consist of (and is mapped to openEO processes):

resample resolution -> resample_spatial
temporal aggregation -> aggregate_temporal_period
filter extent -> filter_bbox/spatial
cloud removal -> cloud_detection or masking based on cloud bands

-> combine these to a new openEO process with some arguments that are commonly used with reasonable defaults

m-mohr commented 1 year ago

datacube = load_collection("s2", temporal_extent = ..., spatial_extent = ..., bands = ...)
model = load_ml_model("my-model-id")

# variant 1 (does NOT work with the current definition in this PR)
function fn(data) {
    let values = ml_predict(data, model)
    return array_element(values, 0)
}
datacube2 = reduce_dimension(datacube, dimension = "bands", reducer = fn)

function fn2(data) {
    return ml_predict(data, model)
}
datacube3 = apply_dimension(datacube, dimension = "bands", target_dimension = "predictions", process = fn2)

# variant 2 (works with the current definition in this PR)
datacube2 = ml_predict(datacube, model)
datacube2 = drop_dimension(datacube, "predictions")

datacube3 = ml_predict(datacube, model)

PondiB commented 1 year ago

@m-mohr , is there a reason why the keyword fit is used in the naming convention instead of train? I will work on data cube regularization and do a different PR.

m-mohr commented 1 year ago

Just to align with fit_cuve, I guess. Train is also fine...

PondiB commented 1 year ago

Just to align with fit_cuve, I guess. Train is also fine...

Cool

PondiB commented 10 months ago

@m-mohr is renaming these two to follow with a prefix of "ml" a possible alternative? load_ml_model() > ml_load_model() save_ml_model() > ml_save_model()

m-mohr commented 10 months ago

Why? The current proposal follows the load_* and save_result schema.

PondiB commented 10 months ago

Why? The current proposal follows the load_* and save_result schema.

Cool makes sense to follow the previous schema. My initial thoughts came from the perception that it would be good for a general user if most of the ml operations start with that prefix i.e. "ml_".

m-mohr commented 8 months ago

The STAC ML Model extension may get deprecated in favor of https://github.com/crim-ca/dlm-extension @PondiB I think it would be great to get in touch with the folks so that we can influence that it also works for openEO.

PondiB commented 8 months ago

The STAC ML Model extension may get deprecated in favor of https://github.com/crim-ca/dlm-extension @PondiB I think it would be great to get in touch with the folks so that we can influence that it also works for openEO.

Sure thanks, I just saw the notification about it, I will follow up with them.