GUDHI / gudhi-devel

The GUDHI library is a generic open source C++ library, with a Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding.
https://gudhi.inria.fr/
MIT License
254 stars 65 forks source link

Archipelago - dict transformer for vectorizing persistence diagrams #1017

Closed martinroyer closed 3 months ago

martinroyer commented 9 months ago

Transformer that dictionary-wraps persistence diagram vectorizers, i.e. objects from gudhi.representations.vector_methods. One provides persistence diagram vectorizers (by way of either island or island_dict), and the Archipelago object will |fit on| and |transform = vectorize| list or series of persistence diagrams (in pandas format). The object is sklearn-API consistent.

So the difference between Archipelago and methods from gudhi.representations.vector_methods is that this operates on the "whole" persistence diagrams e.g. [(0, (0.0, 2.34)), (0, (0.0, 0.956)), (1, (0.536, 0.856)), (2, (1.202, 1.734))], whereas methods from vector_methods tend to operate on a single dimension e.g. list of numpy arrays in $R^2$.

So far it feels nice to have something like Archipelago(island_dict={2: BettiCurve(resolution=4), 0:TopologicalVector(threshold=3)}) work, and also it looks to be working on both lists of diagrams and pandas.Series of diagrams.

This PR also contains minor edits to Atol like default quantiser.

martinroyer commented 3 months ago

So I implemented what we discussed in February in this PR: the ability (for Atol) to handle empty diagrams (on fit and transform), that translates naturally into the same ability for a dict transformer.

I added a test file that checks if a vectorizer is able to handle various scenarii with input persistence diagram (new format): fit/transform/fit when empty/transform when empty/panda's set_output/ColumnTransformer.

We could rewrite and keep only the tests that best interest us if we want to have a sort of common interface for vectorization classes down the way, and add the classes to the test once they do what we want them to do.

I removed archipelago because I believe it is well handled by scikit-learn'sColumnTransformer (see example in the test file).

martinroyer commented 3 months ago

So this PR has now become only a "Atol robust update" coupled with some interesting tests for a shared vectorisation method interface. I can redraft it into one/two PRs if we want things to be clearer.

martinroyer commented 3 months ago

This is continued in PR #1096.