Closed astamm closed 1 year ago
Is there some randomness in the way the cubical complex computes persistence which could explain such differences?
Hi Aymeric, is this an issue that only happens with R, or is it possible to reproduce it directly with Python? I don't think there is supposed to be any randomness in computing the persistence diagram of a periodic cubical complex, but it is hard to say without a full example, and I don't know what to look at...
Hey Marc. Well I only see through the R interface but I don't see any obvious reason why that would be R-specific as I am only calling Python classes and methods. Specifically the unit test setup in R is:
n <- 10
X <- cbind(seq(0, 1, len = n), seq(0, 1, len = n))
cc <- CubicalComplex$new(top_dimensional_cells = X)
cc$compute_persistence()$persistence()
the Python equivalent of which would be probably something like:
import numpy as np
import gudhi as gd
n = 10
X = np.linspace(0, 1, num = n)
X = [X, X]
cc = gd.CubicalComplex(top_dimensional_cells = X)
cc.compute_persistence()
cc.persistence()
Then we would need to test this Python code on the various platforms I mentioned.
As expected, this prints
[(0, (0.0, inf))]
on linux, whatever the version of python used.
cc.compute_persistence() cc.persistence()
By the way, this is redundant, persistence already calls compute_persistence, so this computes it twice.
As expected, this prints
[(0, (0.0, inf))]
on linux, whatever the version of python used.
Which Python versions did you include in your tests ? 3.8 to 3.10 ? which Linux distribution ? The problem seems to be specific to Ubuntu 20.04.
Ok there is actually no issues with the cubical complex, only with the periodic cc. When I do:
cc = gd.PeriodicCubicalComplex(top_dimensional_cells = X, periodic_dimensions = True)
cc.persistence()
I get on most systems a diagram with points for dimension 0 and 1 but on Ubuntu 20.04 for some Python and R versions, I get something in dimension 2 as well so that .persistence()
, .betti_numbers()
, .num_simplices(),
.persistent_betti_numbers()and
.cofaces_of_persistence_pairs()` does not return the expected value.
cc = gd.PeriodicCubicalComplex(top_dimensional_cells = X, periodic_dimensions = True)
periodic_dimensions
is supposed to be a vector<bool>
. If you pass True, I don't know, maybe it converts True to 1, constructs a vector of size 1, and accessing the second element is then undefined behavior...
It does seem easy to misuse periodic_dimensions
, we should either diagnose the error, or interpret a bool
the same as an array with the right size and constant value.
Actually, when I try passing periodic_dimensions = True
, I get an error
TypeError: 'bool' object is not iterable
I guess it could depend on the version of [cp]ython used, but are you sure that's really the code you tried?
Yes, what I run in R is with
periodic_dimensions = TRUE
which is converted by reticulate::r_to_py()
into True
.
So indeed since in this example it actually would need a length-2 boolean vector, it then has to infer the second element.
In R, this is silently achieved by recycling, i.e., in this case it would automatically extend to c(TRUE , TRUE)
but it might not understand always that it has to recycle because it needs to understand that from the Python side in this case.
Plus, in any event, recycling silently is a bad thing.
So I guess I'll start by passing a boolean vector and see if that improves things.
Ok that seems to be the reason.
With the second element to TRUE
:
> pcc <- PeriodicCubicalComplex$new(
+ top_dimensional_cells = X,
+ periodic_dimensions = c(TRUE, TRUE)
+ )
> pcc$persistence()
# A tibble: 4 × 3
dimension birth death
<int> <dbl> <dbl>
1 2 1 Inf
2 1 0 Inf
3 1 1 Inf
4 0 0 Inf
which is what I get on ubuntu 20.04 when my test fails. And with the second element to FALSE
:
> pcc <- PeriodicCubicalComplex$new(
+ top_dimensional_cells = X,
+ periodic_dimensions = c(TRUE, FALSE)
+ )
> pcc$persistence()
# A tibble: 2 × 3
dimension birth death
<int> <dbl> <dbl>
1 1 1 Inf
2 0 0 Inf
which is what I get on the rest of platforms.
cc.compute_persistence() cc.persistence()
By the way, this is redundant, persistence already calls compute_persistence, so this computes it twice.
I added in all R classes a private attribute that tracks whether the persistence has already been computed. That way, whenever the user calls .compute_persistence()
, it actually only runs if that attribute is FALSE
. Maybe that could be interesting to do so also in the Python classes or even at the C++ level?
I added in all R classes a private attribute that tracks whether the persistence has already been computed. That way, whenever the user calls
.compute_persistence()
, it actually only runs if that attribute isFALSE
.
Then you have to take care to set it back to false whenever someone for instance changes a filtration value or inserts a simplex in the SimplexTree. Also, you need to remember what options were passed to compute_persistence, in case the second call has different options.
Yes you are right. It needs some care in its implementation to get things right.
I think this one was only on the R side, so we can close it.
In preparing the release of rgudhi for enabling the use of GUDHI in R, my CI which runs the install on several platform spotted an issue in the unit tests for
CubicalComplex
and its periodic version: https://github.com/LMJL-Alea/rgudhi/actions/runs/3886901289.Specifically, I request install of
scikit-learn==1.2.0
andgudhi==3.7.1
. So this explains why CI fails with Python 3.7 as latest versions of scikit-learn are not compatible with Python 3.7. But I cannot figure out the rest of the issues. Specifically, I generate fake data which is a list of two identical lists containing 10 numbers from 0 to 1 equally spread. Then I instantiate theCubicalComplex
class or the periodic version and run the method.persistence()
and this gives two very different results according to the external deps version.Specifically, it always fails to match expected value on ubuntu 20.04 (while it finds the expected value on macOS and windows) with:
But still with ubuntu 20.04, it works well with: