Many connected components from sublevel filtration on VietorisRips

wreise commented 2 years ago

Describe the bug I am using VietorisRipsPersistence to calculate sublevel set persistence on a 1-d complex. For very particular inputs (see attached file), I obtain diagrams with a few connected components, whereas there should be only one.

To reproduce Download and unzip arrays.zip, then execute:

import numpy as np
from scipy import sparse
from gtda.homology import VietorisRipsPersistence

def signal_to_complex(x):
    """Convert x an array (n,) into a 1-dimensional complex representing an interval."""
    n = x.shape[0]
    edges = np.maximum(x[:-1], x[1:])
    data = [x, edges, edges]
    offsets = [0, 1, -1]

    x_sparse = sparse.diags(data, offsets, shape=(n, n))

    return x_sparse

samples = np.fromfile("arrays").reshape(2, 400)
complexes = [signal_to_complex(s) for s in samples]
# complexes = [signal_to_complex(s + 1e-10) for s in samples]
Vr = VietorisRipsPersistence(metric="precomputed", homology_dimensions=[0, ],
                             reduced_homology=False)
dgms = Vr.fit_transform(complexes)
print(dgms)

Expected behavior The diagram for the first array should contain a single essential components.

Actual behaviour The diagram for the first array contains 6 essential components.

Versions Linux-5.4.0-110-generic-x86_64-with-glibc2.27 Python 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] NumPy 1.21.5 SciPy 1.8.0 Joblib 1.1.0 Scikit-learn 1.0.2 Giotto-tda 0.5.1

Additional context Interestingly, when a constant is added to the input arrays (before creating the sparse matrices), we obtain the desired behavior. arrays.zip

ulupo commented 2 years ago

Thanks @wreise! So, this is definitely with the PyPI wheels?

wreise commented 2 years ago

You have a point, @ulupo, it's not! :sweat: Let me do that!

wreise commented 2 years ago

I redownloaded the wheels and the behavior persists

ulupo commented 2 years ago

Ok, thanks! Would be also interesting to see if this is something we inherit from ripser.py, do you think you would be able to test that?

wreise commented 2 years ago

The data contains zeros, which are not explicit in complexes. Hence, they are interpreted by VietorisRipsPersistence as missing edges, instead of edges with value 0.

giotto-ai / giotto-tda

Many connected components from sublevel filtration on VietorisRips #635