Refactor igraph.Graph output of Nerve, modify `fit` behaviour of Nerve, add `store_edge_elements` kwarg to Nerve and make_mapper_pipeline, add Nerve and ParallelClustering to docs #447
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[x] Breaking change (fix or feature that would cause existing functionality to change)
Description
This PR stemmed from the idea of integrating giotto-tda more tightly with our python-igraph backend for Mapper graphs. Specifically, I believe that storing all Mapper node attributes as a graph-level dictionary (with key "node_metadata") does not follow the recommended practice for storing vertex attributes in igraph.Graph objects, see https://igraph.org/python/doc/tutorial/tutorial.html#setting-and-retrieving-attributes. Instead, it seems to be that one should fully exploit the VertexSeq data structure which is accessible via graph.vs -- and similarly the EdgeSeq data structure which is accessible via graph.es.
In this PR:
Node metadata is stored as vertex attributes accessible by graph.vs[attr_name][node_id] or graph.vs[node_id][attr_name] for attr_name in ["pullback_set_label", "partial_cluster_label", "node_elements"].
"node_id" is removed from node attributes as it always coincided with theigraph.Graph node number anyway.
Sizes of intersections are automatically stored as edge weights, accessible by graph.es["weight"].
A "store_intersections" kwarg has been added to Nerve and make_mapper_pipeline to allow storing indices of node intersections as edge attributes, accessible via graph.es["edge_elements"].
The logic of the Nerve.fit_transform code has been simplified.
The attributes nodes_ and edges_ previously stored by Nerve.fit have been removed. Now the entire graph is stored as graph_ instead.
The documentation of make_mapper_pipeline has been improved
ParallelClustering and Nerve have been exposed in the __init__ and in the oline docs. This is because their docstrings might be useful to the user.
Two new tests have been added to test_nerve to check that the new store_edge_elements kwarg works as expected, and that min_intersection works as expected.
The test coverage for the Mapper visualisation modules has been increased.
Tests have been created for plot_betti_curves and plot_betti_surfaces.
plotly_params kwargs have been added to the plot methods of some transformers in gtda/diagrams/representations which had been forgotten in #441.
Existing tests, the mapper quickstart notebook, and the mapper plotting functions have been adapted.
Any other comments?
The behaviour of the mapper plotting functions is completely unchanged.
@wreise it would be great if you could generate a test dump for the documentation (and notebooks) following these changes, so that we can check they look OK.
Types of changes
Description This PR stemmed from the idea of integrating
giotto-tda
more tightly with ourpython-igraph
backend for Mapper graphs. Specifically, I believe that storing all Mapper node attributes as a graph-level dictionary (with key"node_metadata"
) does not follow the recommended practice for storing vertex attributes inigraph.Graph
objects, see https://igraph.org/python/doc/tutorial/tutorial.html#setting-and-retrieving-attributes. Instead, it seems to be that one should fully exploit theVertexSeq
data structure which is accessible viagraph.vs
-- and similarly theEdgeSeq
data structure which is accessible viagraph.es
.In this PR:
graph.vs[attr_name][node_id]
orgraph.vs[node_id][attr_name]
forattr_name
in["pullback_set_label", "partial_cluster_label", "node_elements"]
."node_id"
is removed from node attributes as it always coincided with theigraph.Graph
node number anyway.graph.es["weight"]
."store_intersections"
kwarg has been added toNerve
andmake_mapper_pipeline
to allow storing indices of node intersections as edge attributes, accessible viagraph.es["edge_elements"]
.Nerve.fit_transform
code has been simplified.nodes_
andedges_
previously stored byNerve.fit
have been removed. Now the entire graph is stored asgraph_
instead.make_mapper_pipeline
has been improvedParallelClustering
andNerve
have been exposed in the__init__
and in the oline docs. This is because their docstrings might be useful to the user.test_nerve
to check that the newstore_edge_elements
kwarg works as expected, and thatmin_intersection
works as expected.plot_betti_curves
andplot_betti_surfaces
.plotly_params
kwargs have been added to theplot
methods of some transformers ingtda/diagrams/representations
which had been forgotten in #441.Any other comments? The behaviour of the mapper plotting functions is completely unchanged.
Checklist
flake8
to check my Python changes.pytest
to check this on Python tests.