Closed martinroyer closed 3 months ago
So I implemented what we discussed in February in this PR: the ability (for Atol
) to handle empty diagrams (on fit and transform), that translates naturally into the same ability for a dict transformer.
I added a test file that checks if a vectorizer is able to handle various scenarii with input persistence diagram (new format): fit/transform/fit when empty/transform when empty/panda's set_output
/ColumnTransformer
.
We could rewrite and keep only the tests that best interest us if we want to have a sort of common interface for vectorization classes down the way, and add the classes to the test once they do what we want them to do.
I removed archipelago because I believe it is well handled by scikit-learn'sColumnTransformer
(see example in the test file).
So this PR has now become only a "Atol robust update" coupled with some interesting tests for a shared vectorisation method interface. I can redraft it into one/two PRs if we want things to be clearer.
This is continued in PR #1096.
Transformer that dictionary-wraps persistence diagram vectorizers, i.e. objects from gudhi.representations.vector_methods. One provides persistence diagram vectorizers (by way of either
island
orisland_dict
), and the Archipelago object will |fit on| and |transform = vectorize| list or series of persistence diagrams (in pandas format). The object is sklearn-API consistent.So the difference between Archipelago and methods from gudhi.representations.vector_methods is that this operates on the "whole" persistence diagrams e.g.
[(0, (0.0, 2.34)), (0, (0.0, 0.956)), (1, (0.536, 0.856)), (2, (1.202, 1.734))]
, whereas methods from vector_methods tend to operate on a single dimension e.g. list of numpy arrays in $R^2$.So far it feels nice to have something like
Archipelago(island_dict={2: BettiCurve(resolution=4), 0:TopologicalVector(threshold=3)})
work, and also it looks to be working on both lists of diagrams and pandas.Series of diagrams.This PR also contains minor edits to Atol like default quantiser.