GUDHI / gudhi-devel

The GUDHI library is a generic open source C++ library, with a Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding.
https://gudhi.inria.fr/
MIT License
246 stars 65 forks source link

KNearestNeighbors interface homogenization + empty DTMRipsComplex #999

Closed VincentRouvreau closed 8 months ago

VincentRouvreau commented 8 months ago

Motivated by this comment

current behaviour

from gudhi.point_cloud.dtm import DistanceToMeasure
import numpy as np
DistanceToMeasure(0, implementation="ckdtree").fit_transform(numpy.random.rand(1000, 4))
# ValueError: zero-size array to reduction operation maximum which has no identity
DistanceToMeasure(0, implementation="sklearn").fit_transform(numpy.random.rand(1000, 4))
# ValueError: Expected n_neighbors > 0. Got 0
DistanceToMeasure(0, implementation="hnsw").fit_transform(numpy.random.rand(1000, 4))
# /home/gailuron/workspace/gudhi/gudhi-devel/build/src/python/gudhi/point_cloud/dtm.py:67: RuntimeWarning: invalid value encountered in true_divide
#   dtm = distances.sum(-1) / self.k
# array([nan, ..., nan],
#       dtype=float32)

# ==========================================================================================

DistanceToMeasure(1001, implementation="ckdtree").fit_transform(numpy.random.rand(1000, 4))
# array([inf, ..., inf])
DistanceToMeasure(1001, implementation="sklearn").fit_transform(numpy.random.rand(1000, 4))
# ValueError: Expected n_neighbors <= n_samples,  but n_samples = 1000, n_neighbors = 1001
DistanceToMeasure(1001, implementation="hnsw").fit_transform(numpy.random.rand(1000, 4))
# RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is too small

(This behaviour is inherited from KNearestNeighbors class)

Proposal

To me, sklearn behaves the most appropriately from a user point of view and KNearestNeighbors class should stick to this user interface.