Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
49 stars 15 forks source link

New definitions for fit_curve and predict_curve #420

Closed m-mohr closed 1 year ago

m-mohr commented 1 year ago

Make fit_curve usable in apply_dimension, scrap gap filling for predict_curve, clarify no data handling, etc.

LukeWeidenwalker commented 1 year ago

@ValentinaHutter you know this better than I do! :)

soxofaan commented 1 year ago

Small side question:

throws an InvalidValues exception if invalid values are encountered

Does "encountered" here mean just in the input data, or also during the fitting procedure?

ValentinaHutter commented 1 year ago

I agree that finite numbers should be valid.

Small side question:

throws an InvalidValues exception if invalid values are encountered

Does "encountered" here mean just in the input data, or also during the fitting procedure?

In our implementation we actually do not consider invalid values in the fitting procedure.... At least, we explicitely exclude NaN values that naturally occur in the datasets, there. When an infinite number is in the dataset, an exception is raised and the fitting cannot be started. Let me know if we should change this :)

m-mohr commented 1 year ago

It was meant to cater for invalid values in the input data, but not sure whether and how we should cater for other cases?

clausmichele commented 1 year ago

This process must be able to handle invalid data internally and should not return an error if some invalid points are present in the time series.

m-mohr commented 1 year ago

@clausmichele So you are saying we should remove the InvalidValues exception and all mentions of it and instead say that invalid values must be dealt with smoothly? Is this possible in all cases?

clausmichele commented 1 year ago

Indeed. I can't think about all the possible scenarios, but the one for which we created this process, fitting an harmonic curve to each pixel of a raster-cube, must handle NaNs occurring due to cloud masked or or invalid pixels along the time series. Should we make this part explicit? Adding a default ignore_nodata = true ? I don't know if there could be a case where the user wants to be warned if there are invalid values in the raster-cube/time series.

soxofaan commented 1 year ago

Doesn't it actually all depend on how the callback provided in the function parameter works? https://github.com/Open-EO/openeo-processes/blob/0dd3ab0d81f67506547136532af39b5c9a16771e/proposals/fit_curve.json#L38-L43

If that function properly supports handling invalid data, then the whole fit_curve call will work. If it doesn't, then fir_curve will fail when given data that contains invalid values.

So this seems like to user's responsibility to make it work with or without invalid data support?

Or, would it be a feature of fit_curve to skip parts of the raster data cube with invalid values and only provide function the parts where the x input is fully valid?

clausmichele commented 1 year ago

@soxofaan yes exactly! So fit_curve shouldn't check the input data, but it's the fitting function that takes care of it.

m-mohr commented 1 year ago

I cherry-picked the single word change from this PR which seemed okay. The issues that came up now are somewhat related, but not strictly related to the intial PR. So having the first action done, we can now discuss additional changes here independantly.

About removing the InvalidValues error, what happens if a data cube containing strings is passed? The fitting function is defined for numbers only according to the schemas. I think with the wordsmithing updates from @soxofaan the process description makes sense to me, but please correct me if you think otherwise.

m-mohr commented 1 year ago

Rough example from today's discussions:

labels = dimension_labels(datacube) seconds = array_apply(labels, fn = to_seconds) cube2 = apply_dimension(datacube, fn, "t", seconds)

function fit: params = fit_curve(context, [1,1,1], fn);

predictions = predict_curve(cube2)