Closed m-mohr closed 1 year ago
@ValentinaHutter you know this better than I do! :)
Small side question:
throws an
InvalidValues
exception if invalid values are encountered
Does "encountered" here mean just in the input data, or also during the fitting procedure?
I agree that finite numbers should be valid.
Small side question:
throws an
InvalidValues
exception if invalid values are encounteredDoes "encountered" here mean just in the input data, or also during the fitting procedure?
In our implementation we actually do not consider invalid values in the fitting procedure.... At least, we explicitely exclude NaN values that naturally occur in the datasets, there. When an infinite number is in the dataset, an exception is raised and the fitting cannot be started. Let me know if we should change this :)
It was meant to cater for invalid values in the input data, but not sure whether and how we should cater for other cases?
This process must be able to handle invalid data internally and should not return an error if some invalid points are present in the time series.
@clausmichele So you are saying we should remove the InvalidValues
exception and all mentions of it and instead say that invalid values must be dealt with smoothly? Is this possible in all cases?
Indeed. I can't think about all the possible scenarios, but the one for which we created this process, fitting an harmonic curve to each pixel of a raster-cube, must handle NaNs occurring due to cloud masked or or invalid pixels along the time series.
Should we make this part explicit? Adding a default ignore_nodata = true
? I don't know if there could be a case where the user wants to be warned if there are invalid values in the raster-cube/time series.
Doesn't it actually all depend on how the callback provided in the function
parameter works?
https://github.com/Open-EO/openeo-processes/blob/0dd3ab0d81f67506547136532af39b5c9a16771e/proposals/fit_curve.json#L38-L43
If that function properly supports handling invalid data, then the whole fit_curve
call will work. If it doesn't, then fir_curve
will fail when given data that contains invalid values.
So this seems like to user's responsibility to make it work with or without invalid data support?
Or, would it be a feature of fit_curve
to skip parts of the raster data cube with invalid values and only provide function
the parts where the x
input is fully valid?
@soxofaan yes exactly! So fit_curve
shouldn't check the input data, but it's the fitting function
that takes care of it.
I cherry-picked the single word change from this PR which seemed okay. The issues that came up now are somewhat related, but not strictly related to the intial PR. So having the first action done, we can now discuss additional changes here independantly.
About removing the InvalidValues error, what happens if a data cube containing strings is passed? The fitting function is defined for numbers only according to the schemas. I think with the wordsmithing updates from @soxofaan the process description makes sense to me, but please correct me if you think otherwise.
Rough example from today's discussions:
labels = dimension_labels(datacube) seconds = array_apply(labels, fn = to_seconds) cube2 = apply_dimension(datacube, fn, "t", seconds)
function fit: params = fit_curve(context, [1,1,1], fn);
predictions = predict_curve(cube2)
Make fit_curve usable in apply_dimension, scrap gap filling for predict_curve, clarify no data handling, etc.