Closed aberges-grd closed 2 years ago
The small difference observed could be explained because the weights used in the Simpson quadrature for FDataGrid
are not uniform:
import scipy.integrate
np.sqrt(np.sum((f1 - f2)**2)) # 4.256318990810707
weights = scipy.integrate.simpson(np.eye(100)) # weights are [0.41666667, 1.08333333, 1, 1, ..., 1, 1, 1.08333333, 0.41666667]
np.sqrt(np.sum((f1 - f2)**2 * weights)) # 4.2542000336339605
This difference becomes smaller as the number of grid points grow, because the only four points with weights different to 1 are irrelevant in a large sum.
Ah, I see. I was reporting this because in my case, the small differences were enough to change the result of an agglomerative clustering algorithm (where the skfda distances give a better clustering). So I was very confused.
As an edit (for completion's sake), scipy
's euclidean
accepts a w
parameter that does just that weighting.
...
euclidean(f1, f2, weights) # 4.254200037595696
Describe the bug Assume we have 2 arrays describing 2 smooth functions. When creating an FDatagrid object with support
= range(100)
, the computed euclidean (l2) distance is not the same when I useld_distance
on the fdatagrid objects and when I use it on the numpy arrays directly.To Reproduce Code to reproduce the behavior:
Expected behavior Given that the support given that
l2_distance
is equivalent to the euclidean distance for the example given (domain of the function is the range 0..99), I'd expectl2_distance
to give the same value as when you call it on numpy arrays.Version information