giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
845 stars 173 forks source link

[BUG] CubicalPersistence unable to handle lists of arrays of variable shapes #499

Closed ulupo closed 4 years ago

ulupo commented 4 years ago

Describe the bug Following the changes in #467 which allow CubicalPersistence to act on lists of arrays, https://github.com/giotto-ai/giotto-tda/blob/1472ec629aeee82236730c6e86b1eb2f7acf3c1c/gtda/homology/cubical.py#L169 will fail when X is a list of arrays of variable shape.

This can be fixed either by checking for the type of X and using different logics for computing the maximum value accordingly, or by aligning the behaviour of CubicalPersistence to the simplicial transformers by making np.inf the default value of infinity_values, until a unified design change is pushed for replacement of infinities that is common to both the simplicial and cubical transformers.

wreise commented 4 years ago

This points to the fact that there are no tests for lists and in particular for the case when infinity_values=None What would you say about changing the tests to

X = [np.array([[2., 2.47942554],
               [2.47942554, 2.84147098],
               [2.98935825, 2.79848711],
               [2.79848711, 2.41211849],
               [2.41211849, 1.92484888]])]

?

ulupo commented 4 years ago

@wreise you are right about adding tests for lists but a list of arrays of the same shape actually works with np.max (it is smart like that), so one really needs to add lists of multiple arrays with different shapes to avoid this regression.

wreise commented 4 years ago

Ups, good catch!

ulupo commented 4 years ago

@gtauzin just assigning you to bring it to your attention as it is relevant to later plans for API change. I can take care of this very easy change one way or the other. Personally, I slightly prefer having simplicial and cubical transformers aligned at all time including now, until we have a universal solution to the infinity_values=None problem. Thus I'd slightly advocate for having infinity_values=np.inf as default for CubicalPersistence for now and removing the max computation, or keeping the None but internally replacing it with np.inf.