Error when using HDBSCAN

haesleinhuepf commented 2 years ago

I just executed HDBSCAN clustering and received an error. We also saw it during the course last week:

File c:\structure\code\napari-clusters-plotter\napari_clusters_plotter\_clustering.py:519, in hdbscan_clustering(reg_props=              0         1         2         3
0 ...016  0.159811 -0.031124

[35947 rows x 4 columns], min_cluster_size=5, min_samples=5)
    515 @catch_NaNs
    516 def hdbscan_clustering(
    517     reg_props: pd.DataFrame, min_cluster_size: int, min_samples: int
    518 ) -> Tuple[str, np.ndarray]:
--> 519     import hdbscan
    521     clustering_hdbscan = hdbscan.HDBSCAN(
    522         min_cluster_size=min_cluster_size, min_samples=min_samples
    523     )
    525     return "HDBSCAN", clustering_hdbscan.fit_predict(reg_props)

File ~\miniconda3\envs\bio39\lib\site-packages\hdbscan\__init__.py:1
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
      2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
      3 from .validity import validity_index

File ~\miniconda3\envs\bio39\lib\site-packages\hdbscan\hdbscan_.py:509
    494         row_indices = np.where(np.isfinite(matrix).sum(axis=1) == matrix.shape[1])[0]
    495     return row_indices
    498 def hdbscan(
    499     X,
    500     min_cluster_size=5,
    501     min_samples=None,
    502     alpha=1.0,
    503     cluster_selection_epsilon=0.0,
    504     max_cluster_size=0,
    505     metric="minkowski",
    506     p=2,
    507     leaf_size=40,
    508     algorithm="best",
--> 509     memory=Memory(cachedir=None, verbose=0),
    510     approx_min_span_tree=True,
    511     gen_min_span_tree=False,
    512     core_dist_n_jobs=4,
    513     cluster_selection_method="eom",
    514     allow_single_cluster=False,
    515     match_reference_implementation=False,
    516     **kwargs
    517 ):
    518     """Perform HDBSCAN clustering from a vector array or distance matrix.
    519
    520     Parameters
   (...)
    672            Density-based Cluster Selection. arxiv preprint 1911.02282.
    673     """
    674     if min_samples is None:

TypeError: __init__() got an unexpected keyword argument 'cachedir'

Maybe HDBSCAN was updated and we need to update our code as well. When running this from the napari console, the error pops up as well:

import hdbscan

haesleinhuepf commented 2 years ago

Just some more version info:

(bio39) C:\structure\code\napari-clusters-plotter>conda list hdbscan
# packages in environment at C:\Users\rober\miniconda3\envs\bio39:
#
# Name                    Version                   Build  Channel
hdbscan                   0.8.28           py39h5d4886f_1    conda-forge

lazigu commented 2 years ago

Hi Robert @haesleinhuepf,

This is due to this . I have pinned joblib version in #134 to not have failing tests (hence the issue #135 when the problem is fixed), but I never made a release with a pinned version..

haesleinhuepf commented 2 years ago

Thank you for the lightspeed feedback @lazigu ! There's no hurry to fix this by releasing a new version. I'm just documenting it so that we don't forget it. 🌞

lazigu commented 1 year ago

After the release of napari-clusters-plotter 0.6.0, the plugin is shipped with the pinned joblib version. Hence this issue can be closed. When the issue with joblib is fixed the pinned version should be removed (https://github.com/BiAPoL/napari-clusters-plotter/issues/135)

BiAPoL / napari-clusters-plotter

Error when using HDBSCAN #138