GUDHI / gudhi-devel

The GUDHI library is a generic open source C++ library, with a Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding.
https://gudhi.inria.fr/
MIT License
249 stars 65 forks source link

[python] Cover complex usage #659

Open erooke opened 2 years ago

erooke commented 2 years ago

I'm currently working on comparing the open source implementations of the Mapper algorithm[^1] and the python binding for Gudhi's cover complexes has very unusual performance characteristics in my tests. I was hoping to make sure that I'm calling the library correctly.

To my understanding the following code is the correct way to:

  1. project the data down using PCA
  2. cover the projected data with 8 intervals with 25% overlap
  3. pull back this cover to the original dataset
  4. cluster the pulled back cover elements by joining any points within 0.1 units of eachother
  5. compute the one skeleton nerve of these clusters
import gudhi
from sklearn.decomposition import PCA

nerve_complex = gudhi.CoverComplex()

projection = list(
    PCA(n_components=1).fit_transform(data).reshape(-1)
)
list_data = list(map(list, data))

nerve_complex.set_type("Nerve")

nerve_complex.set_point_cloud_from_range(list_data)
nerve_complex.set_function_from_range(projection)
nerve_complex.set_color_from_range(projection)

nerve_complex.set_graph_from_rips(0.1)
nerve_complex.set_resolution_with_interval_number(8)
nerve_complex.set_gain(0.25)
nerve_complex.set_cover_from_function()
nerve_complex.find_simplices()

Is there a better way using Gudhi in python to accomplish this pipeline?

[^1]: Singh, Gurjeet & Mémoli, Facundo & Carlsson, Gunnar. (2007). Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. Eurographics Symposium on Point-Based Graphics 2007. 91-100. 10.2312/SPBG/SPBG07/091-100.

MathieuCarriere commented 2 years ago

Sorry for the late reply. I think that when you set the type to "Nerve", no clustering is applied, i.e., step 4 of your scheme is not done. Could you tell us what happens with set_type('GIC')?

erooke commented 2 years ago

It seems that clustering is being performed with both "Nerve" and "GIC". If I take my dataset to be a circle and the projection to be the first component the resulting graph is a circle. Best I can tell if clustering was not being performed the resulting graph would be a straight line instead of a circle.

For completeness here is the code snippet I used to check the clustering:

from math import cos, sin

import gudhi
import numpy as np

data = list()
projection = list()

for theta in np.linspace(0, 2 * np.pi, num=1000):
    x = cos(theta)
    y = sin(theta)
    projection.append(x)
    data.append([x,y])

nerve_complex = gudhi.CoverComplex()

nerve_complex.set_type("Nerve")

nerve_complex.set_point_cloud_from_range(data)
nerve_complex.set_function_from_range(projection)
nerve_complex.set_color_from_range(projection)

nerve_complex.set_graph_from_rips(0.1)
nerve_complex.set_resolution_with_interval_number(8)
nerve_complex.set_gain(0.25)
nerve_complex.set_cover_from_function()
nerve_complex.find_simplices()
nerve_complex.plot_dot()