Open a-berg opened 3 years ago
First, I will explain things as they are in the current development branch, as we will release it as a new version in a few days. Remember that with pip you can install a package from a Github branch, even if it has not yet been released in PyPI.
A metric is a Python callable with the following definition (in your case MetricElementType
should be FDataGrid
, as DTW cannot be implemented for FDataBasis
as far as I know):
class Metric(Protocol[MetricElementType]):
"""Protocol for a metric between two elements of a metric space."""
@abstractmethod
def __call__(
self,
__e1: MetricElementType,
__e2: MetricElementType,
) -> np.ndarray:
pass
It receives two FDataGrid
objects with the same length, and returns a 1d array where the i-th element has the distance between the i-th elements of both FDataGrid
.
You can either create a function with the prototype of __call__
, or create a class (and maybe inherit from Metric[FDataGrid]
so that Mypy helps you with the typing). The class is better if you want to pass parameters to the metric (such as p
for a Lp metric).
In order to implement the function, you probably will need to access the grid_points
attribute of the involved FDataGrid
instances, as well as the actual function values in the data_matrix
attribute. Those are explained in https://fda.readthedocs.io/en/latest/auto_tutorial/plot_getting_data.html#the-fdatagrid-class .
I am not familiar with dtw-python
. I assume that it works only with univariate scalar-valued functions. In that case you probably will need to use gridpoints[0]
and data_matrix[..., 0]
in order to pick the grid points for the first (and only) dimension, and the values of the first (and only) coordinate of the function return.
That would be enough to use it. If you want to optimize also the pairwise computation of that metric, it is also possible to override the default pair-by-pair computation, as it is done for the L2 case here:
I also wanted to say that if it is easier for you, scikit-fda is designed to be as compatible as possible with scikit-learn, and we reuse its hyperparameter tuning utilities. You can even mix both in the same Pipeline, see here for examples:
https://fda.readthedocs.io/en/latest/auto_tutorial/plot_skfda_sklearn.html
Note that as dtw-python
has a GPL license, I cannot use it in our project, which has an incompatible BSD3 license. Thus, I cannot accept a PR that uses that library. If you implement it on your own or you use a more permissive library I could accept that implementation.
First of all let me thank you for this package, and say: great work!
Is your feature request related to a problem? Please describe. I am working with timeseries data. For now I have applied K-means with L2 norm with okay-ish results, however I wanted to try other distance functions to see if these results could be improved. This distance is already implemented in the package
dtw-python
. After some hours I haven't been able to make FDA K-means work with dtw, I have been trying to wrap the function but I am at a loss without knowing more about the inner workings of your library.Describe the solution you'd like Some documentation on how to adapt pre-existing distance functions to your API. I am willing to contribute code if some guidance is provided.
Describe alternatives you've considered I have considered switching to
scikit-learn
. After all, DTW is meant for timeseries data so the functional approach that skfda provides isn't as critical as when applying Euclidean distance. I however think that in order to automate the process and do hyperparameter optimization, it's better if I only use skfda.Additional context