Open pcuestas opened 7 months ago
hi, thank you for opening up the issue. I think this is another method where FDataIrregular is not well defined on.
I'm trying to apply FPCA using this code. https://fda.readthedocs.io/en/stable/auto_examples/plot_fpca_inverse_transform_outl_detection.html#sphx-glr-auto-examples-plot-fpca-inverse-transform-outl-detection-py
I have functions with R^3 -> R.
can FPCA be implemented on FDataIrregular too?
(or should I open up another issue?)
Hello, @ooodragon94.
As I understand, your case is very different from the one I outlined in this issue. There are ways to implement FPCA for irregular data, but we haven't implemented that yet, as FDataIrregular is a very recent addition to the package. You should definitely open another issue explaining the type of data that you have and what you want to do in detail.
The development efforts tend to be steered towards what users request, so it will be very useful to know what you would like to have in the package.
After discussing this issue with @vnmabus and Alberto Suárez, we concluded that the integral of a functional data object should always be the integral over its domain $D$, and not over the interval bounded by the endpoints of the discretization grid (called $D_i$ in the original issue description). This is discussed in depth in #619.
In https://github.com/GAA-UAM/scikit-fda/pull/610 , I have implemented the changes explained above; that is, dividing each integral by the measure $V_i$ of the smallest interval $D_i$ that contains the $i$-th curve's discretization points:
However, once the integral of discretized datasets is properly defined #619 (over the domain of the functional data object), these scores must be redefined so that the integrals are divided by the domain's measure: $V$, instead of $V_i$. For example, the MAE formula will be:
$$MAE = \frac{1}{\sum wi}\sum{i=1}^N w_i \frac{1}{V}\int_D |X_i(t) - \hat X_i(t)|\ dt.$$
Motivation
Computing scores between
FDataIrregular
objects is a missing functionality of the package, and it can be useful when measuring the quality of conversions from irregular objects to basis representation.Desired functionality
Compute scores when both
y_true
andy_pred
areFDataIrregular
objects.How to implement each score?
There is a big problem when implementing scores for
FDataIrregular
: the mean of anFDataIrreuglar
objects is not well defined. Most of the scores (for FData objects) involve computing the mean of an FData object.We can surpass this issue in some of the cases when we want the
"uniform_average"
of the score and not the"raw_values"
. An example where we can avoid computing the mean ismean_absolute_error
. The mean absolute error is defined this way: To avoid having to calculate the mean of theFDataIrregular
whenmultioutput="uniform_average"
, we can change the order of the mean and the integral. That is, instead of: We can use: Where $D_i$ and $V_i$ correspond to the domain of the $i$-th irregular curve and its lebesgue measure, respectively. I am not sure if this choice of not using the whole domain $D$ and its volume $V$ is the best, perhaps it would be less confusing to not bother computing the $V_i$'s, but I believe that the result would be less accurate, implicitly giving more weight to curves that have more spread-out points.This idea can be applied to
mean_absolute_error
,mean_absolute_percentage_error
,mean_squared_error
andmean_squared_log_error
. I am going to implement these in feature/scoring-fdatairregular.r2_score
I believe that the
r2_score
can not be implemented for theFDataIrregular
case, as its definition is to compare how welly_pred
predicts the values ofy_true
in relation to how well the mean does, and the mean is not defined.A possible implementation of
r2_score
forFDataIrregular
objects would be to just compute ther2_score
of(y_true.values, y_pred.values)
. However, I do not think this is a good option, as it disregards the functional structure of the curves, ignoring the points where they are measured and the mean of the values does not have the same meaning as in the other cases (FDataGrid
andFDataBasis
). Moreover, a user can manually callr2_score(y_true.values, y_pred.values)
explicitly, so I do not think we should implement this score for irregular data, as it is not properly defined.The case of
explained_variance_score
is very similar to that ofr2_score
.