giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
832 stars 171 forks source link

KNeighborsGraph "available" in scikit-learn 0.22 as KNeighborsTransformer #108

Open ulupo opened 4 years ago

ulupo commented 4 years ago

Description

A novelty of scikit-learn 0.22 is the introduction of the KNeighborsTransformer, which implements in the case of single point clouds (rather than collections thereof) exactly what we implemented in graphs.KNeighborsGraph.

While we cannot simply drop our version and straight-up use scikit-learn's due to our insistence on collections of objects (diagrams, graphs, etc), this situation raises two points in my opinion:

gtauzin commented 3 years ago

Thanks Umbe, I agree with both your points. Indeed there are many things to get from sklearn by just making it available on a collection of objects and decorators are definitely the way to go, but I believe for praticality and documentation purposes each class should be independently wrapped.

Also I would love to have MDS act on a collection of distance matrices, even if is just for visualization purposes.

ulupo commented 3 years ago

Thanks @gtauzin, yes having been prompted by you to look at this again I think we can do a good job of this. In principle, any dimensionality reduction algorithm (IsoMap, MDS, UMAP, you name it) could and should be turned into a version for collections in an easy way. Individually wrapped is fine of course, but what is the main argument against a factory function that works universally for anything in, say, sklearn.manifold? The interface would be always the same and we would offload the documentation work to scikit-learn, simply pointing out that calling factory(class)(**kwargs) makes an instance of a version of class which acts on collections. It would be easy to keep in sync with scikit-learn this way, too (say some parameters in classes in sklearn.manifold change, for instance).

We do something very close to this in ParallelClustering for Mapper, though we do not expose that class to the user.