Thanks for the great package! Here is an example of a failure when there are enough samples, but the model complains that there are not. Works fine when exact=True
import numpy as np
import genieclust
X = np.zeros((3, 768))
k = 2
g = genieclust.Genie(n_clusters=k, gini_threshold=0.01, exact=False)
labels = g.fit_predict(X)
Error trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[328], line 7
5 k = 2
6 g = genieclust.Genie(n_clusters=k, gini_threshold=0.01, exact=False)
----> 7 labels = g.fit_predict(X)
File .../lib/python3.8/site-packages/genieclust/genie.py:548, in GenieBase.fit_predict(self, X, y)
520 def fit_predict(self, X, y=None):
521 """
522 Perform cluster analysis of a dataset and return the predicted labels.
523
(...)
546
547 """
--> 548 self.fit(X)
549 return self.labels_
File .../lib/python3.8/site-packages/genieclust/genie.py:1051, in Genie.fit(self, X, y)
972 """
973 Perform cluster analysis of a dataset.
974
(...)
1047
1048 """
1049 cur_state = self._check_params() # re-check, they might have changed
-> 1051 cur_state = self._get_mst(X, cur_state)
1053 if cur_state["verbose"]:
1054 print("[genieclust] Determining clusters with Genie++.", file=sys.stderr)
File .../lib/python3.8/site-packages/genieclust/genie.py:511, in GenieBase._get_mst(self, X, cur_state)
509 cur_state = self._get_mst_exact(X, cur_state)
510 else:
--> 511 cur_state = self._get_mst_approx(X, cur_state)
513 # this might be an "intrinsic" dimensionality:
514 self.n_features_ = cur_state["n_features"]
File .../lib/python3.8/site-packages/genieclust/genie.py:484, in GenieBase._get_mst_approx(self, X, cur_state)
480 d_core = internal.get_d_core(nn_dist, nn_ind, cur_state["M"])
483 if mst_dist is None or mst_ind is None:
--> 484 mst_dist, mst_ind = internal.mst_from_nn(
485 nn_dist,
486 nn_ind,
487 d_core,
488 stop_disconnected=False,
489 verbose=cur_state["verbose"])
490 # We can have a forest here...
492 self.n_samples_ = n_samples
File .../lib/python3.8/site-packages/genieclust/internal.pyx:294, in genieclust.internal.__pyx_fuse_0mst_from_nn()
File .../lib/python3.8/site-packages/genieclust/internal.pyx:381, in genieclust.internal.mst_from_nn()
ValueError: k >= n
Hello,
Thanks for the great package! Here is an example of a failure when there are enough samples, but the model complains that there are not. Works fine when
exact=True
Error trace: