Closed pierre-guillou closed 2 years ago
Can you also use Ripser on high dimensional data, like 10D?
Homology dimension k computations for a dataset of size N require that it be possible in principle to enumerate all (k+1)-simplices of the full simplex with N vertices. In your case (k=2) this means enumerating all 3-simplices (which are 4-tuples of vertices), and this requires that the binomial coefficient (N choose 4) not exceed 2^63 - 1 (the maximum index for signed 64-bit integers). With N=250000, N choose 4 is approximately 1.6e20, while 2^63 is about 9.2e18.
[Edit: I may be off by 1 in some of the above, but the main point remains.]
After modifying the backend to work with 128-bit integers, you should be able to run Ripser on your dataset. Both @ubauer and @MonkeyBreaker have tried this at some point AFAIK. It can be done!
@MassEast the dimensionality of the data is not an obstacle, Ripser simply looks at the matrix of pairwise distances between all pairs of points in the dataset. This takes a little longer to compute in higher dimensions, but only marginally so, especially compared to the time it takes to compute persistence.
After modifying the backend to work with 128-bit integers, you should be able to run Ripser on your dataset. Both @ubauer and @MonkeyBreaker have tried this at some point AFAIK. It can be done!
Thanks @ulupo for the information. @ubauer @MonkeyBreaker would it be possible that you post the diff to get ripser running with 128-bit integers (it doesn't seem like a one-line edit). Thanks!
There is a branch at https://github.com/Ripser/ripser/tree/128bit implementing this. It should be mostly up to date, and it's actually pretty much a one line edit. Since 128 bit ints seem to be non standardized, this might not work out of the box for some compilers.
Thanks!
Hello and thank you for your work!
I am trying to use Ripser to compute persistence diagrams of rather large 3D datasets, coming from the Open-SciVis-Datasets website.
However, Ripser throws a
std::overflow_error
when I try to compute persistence pairs up to the dimension 2. If I understand correctly the C++ code, this exception is related to the computation of binomial coefficients which depend on the size of the input distance matrix and the maximum dimension requested.This issue might be related to https://github.com/Ripser/ripser/issues/25 and https://github.com/Ripser/ripser/issues/32 since they feature similar problems. I already opened an issue in the scikit-tda Python wrapper (I wanted to use Ripser via this Python API) but since this is more related to the C++ code, this issue seems better to belong here.
Below, you will find a Python script I used to trigger this overflow error. It takes a raw file from Open-Scivis-Datasets and iterates over the edges of the cubical complex to generate a sparse matrix in sparse triplet format that is fed to the Ripser executable. Although the diagram is computed as expected with the smallest datasets (
nucleon
,marschner_lobb
,silicium
, up to 120k vertices), the error occurs withfuel
,neghip
(more than 250k vertices) and every larger dataset.It there a way to circumvent this computation of these binomial coefficients to handle large datasets (up to 1M vertices)?
Thanks in advance for your help, Best regards, Pierre Guillou