giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
858 stars 175 forks source link

Refactor igraph.Graph output of Nerve, modify `fit` behaviour of Nerve, add `store_edge_elements` kwarg to Nerve and make_mapper_pipeline, add Nerve and ParallelClustering to docs #447

Closed ulupo closed 4 years ago

ulupo commented 4 years ago

Types of changes

Description This PR stemmed from the idea of integrating giotto-tda more tightly with our python-igraph backend for Mapper graphs. Specifically, I believe that storing all Mapper node attributes as a graph-level dictionary (with key "node_metadata") does not follow the recommended practice for storing vertex attributes in igraph.Graph objects, see https://igraph.org/python/doc/tutorial/tutorial.html#setting-and-retrieving-attributes. Instead, it seems to be that one should fully exploit the VertexSeq data structure which is accessible via graph.vs -- and similarly the EdgeSeq data structure which is accessible via graph.es.

In this PR:

  1. Node metadata is stored as vertex attributes accessible by graph.vs[attr_name][node_id] or graph.vs[node_id][attr_name] for attr_name in ["pullback_set_label", "partial_cluster_label", "node_elements"].
  2. "node_id" is removed from node attributes as it always coincided with theigraph.Graph node number anyway.
  3. Sizes of intersections are automatically stored as edge weights, accessible by graph.es["weight"].
  4. A "store_intersections" kwarg has been added to Nerve and make_mapper_pipeline to allow storing indices of node intersections as edge attributes, accessible via graph.es["edge_elements"].
  5. The logic of the Nerve.fit_transform code has been simplified.
  6. The attributes nodes_ and edges_ previously stored by Nerve.fit have been removed. Now the entire graph is stored as graph_ instead.
  7. The documentation of make_mapper_pipeline has been improved
  8. ParallelClustering and Nerve have been exposed in the __init__ and in the oline docs. This is because their docstrings might be useful to the user.
  9. Two new tests have been added to test_nerve to check that the new store_edge_elements kwarg works as expected, and that min_intersection works as expected.
  10. The test coverage for the Mapper visualisation modules has been increased.
  11. Tests have been created for plot_betti_curves and plot_betti_surfaces.
  12. plotly_params kwargs have been added to the plot methods of some transformers in gtda/diagrams/representations which had been forgotten in #441.
  13. Existing tests, the mapper quickstart notebook, and the mapper plotting functions have been adapted.

Any other comments? The behaviour of the mapper plotting functions is completely unchanged.

Checklist

ulupo commented 4 years ago

@wreise it would be great if you could generate a test dump for the documentation (and notebooks) following these changes, so that we can check they look OK.

ulupo commented 4 years ago

@lewtun I've made changes following your suggestions.