GUDHI / gudhi-devel

The GUDHI library is a generic open source C++ library, with a Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding.
https://gudhi.inria.fr/
MIT License
245 stars 65 forks source link

Atol : tests are failing with scikit-learn 1.4.0 #1031

Closed VincentRouvreau closed 3 weeks ago

VincentRouvreau commented 4 months ago

With scikit-learn 1.4.0:

from sklearn.cluster import KMeans
from gudhi.representations.vector_methods import Atol
import numpy as np
a = np.array([[1, 2, 4], [1, 4, 0], [1, 0, 4]])
b = np.array([[4, 2, 0], [4, 4, 0], [4, 0, 2]])
c = np.array([[3, 2, -1], [1, 2, -1]])
atol_vectoriser = Atol(quantiser=KMeans(n_clusters=2, random_state=202006))
atol_vectoriser.fit(X=[a, b, c]).centers
# array([[3.75, 2.  , 0.25],
#        [1.  , 2.  , 1.75]])
atol_vectoriser(a)
# array([0.00892619, 0.20804165])
atol_vectoriser(c)
# array([0.44476906, 0.05480506])
atol_vectoriser.transform(X=[a, b, c])
# array([[0.00892619, 0.20804165],
#       [1.19118877, 0.01362205],
#       [0.44476906, 0.05480506]])

which is not what is expected from Atol documentation

mglisse commented 4 months ago

@martinroyer ?

martinroyer commented 4 months ago

It is because we failed (sorry) to set the n_init parameter in KMeans, even though we got warned: if I run the test with my 1.3.0 scikit-learn version I see the future: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. And the results from Vincent are consistent with that same test if I set KMeans(n_clusters=2, n_init="auto", random_state=202006).

mglisse commented 4 months ago

Isn't this something you already have a fix for in the branch for archipelago?

martinroyer commented 4 months ago

Yes it is done in the archipelago PR https://github.com/GUDHI/gudhi-devel/pull/1017/files

That PR shall converge soon (hopefully) so we can potentially wait for it for these fixes?