Getting error when executing clust

diego-perojil commented 2 years ago

Hi,

I've tested clust on two different systems, using both pip and conda, and I'm getting the following error:

/===========================================================================\ | Clust | | (Optimised consensus clustering of multiple heterogenous datasets) | | Python package version 1.12.0 (2019) Basel Abu-Jamous | +---------------------------------------------------------------------------+ | Analysis started at: Monday 22 November 2021 (13:27:48) | | 1. Reading dataset(s) | | 2. Data pre-processing | | - Automatic normalisation mode (default in v1.7.0+). | | Clust automatically normalises your dataset(s). | | To switch it off, use the -n 0 option (not recommended). | | Check https://github.com/BaselAbujamous/clust for details. | | - Flat expression profiles filtered out (default in v1.7.0+). | | To switch it off, use the --no-fil-flat option (not recommended). | | Check https://github.com/BaselAbujamous/clust for details. | | 3. Seed clusters production (the Bi-CoPaM method) | | 10% | | 20% | | 30% | | 40% | | 50% | | 60% | | 70% | | 80% | Traceback (most recent call last): File "/exports/cmvm/eddie/eb/groups/macqueen_lab/Diego/anaconda/envs/cluster/bin/clust", line 10, in sys.exit(main()) File "/exports/cmvm/eddie/eb/groups/macqueen_lab/Diego/anaconda/envs/cluster/lib/python3.10/site-packages/clust/main.py", line 101, in main clustpipeline.clustpipeline(args.datapath, args.m, args.r, args.n, args.o, args.K, args.t, File "/exports/cmvm/eddie/eb/groups/macqueen_lab/Diego/anaconda/envs/cluster/lib/python3.10/site-packages/clust/clustpipeline.py", line 123, in clustpipeline ures = unc.uncles(X_summarised_normalised, type='A', GDM=GDM, Ks=Ks, params=params, methods=methods, File "/exports/cmvm/eddie/eb/groups/macqueen_lab/Diego/anaconda/envs/cluster/lib/python3.10/site-packages/clust/scripts/uncles.py", line 440, in uncles Utmp = [clustDataset(Xloc[l], Ks[ki], methodsDetailedloc[l], GDMloc[:, l], Ng, l) for ki in range(NKs)] File "/exports/cmvm/eddie/eb/groups/macqueen_lab/Diego/anaconda/envs/cluster/lib/python3.10/site-packages/clust/scripts/uncles.py", line 440, in Utmp = [clustDataset(Xloc[l], Ks[ki], methodsDetailedloc[l], GDMloc[:, l], Ng, l) for ki in range(NKs)] File "/exports/cmvm/eddie/eb/groups/macqueen_lab/Diego/anaconda/envs/cluster/lib/python3.10/site-packages/clust/scripts/uncles.py", line 314, in clustDataset tmpU = cl.clusterdataset(X, K, methods, datasetID) # Obtain U's File "/exports/cmvm/eddie/eb/groups/macqueen_lab/Diego/anaconda/envs/cluster/lib/python3.10/site-packages/clust/scripts/clustering.py", line 22, in clusterdataset U[ms] = ckmeans(X, K, datasetID, methodsloc[ms][1:]) File "/exports/cmvm/eddie/eb/groups/macqueen_lab/Diego/anaconda/envs/cluster/lib/python3.10/site-packages/clust/scripts/clustering.py", line 52, in ckmeans C = skcl.KMeans(K, init=init, max_iter=max_iter, n_init=n_init, n_jobs=njobs).fit(X).labels TypeError: KMeans.init() got an unexpected keyword argument 'n_jobs'

I am executing clust from a directory that has a directory called data, with the data tsv file inside, and the replicate structure file is in the main directory where clust is executed from. my command is: clust data/ -r replicates.txt

Any help will be appreciated, I'm really hoping to use clust in my project.

ngaitan55 commented 2 years ago

I got the same issue providing only one file, hope this gets answered

xiaoyezao commented 2 years ago

I have the similar issue, but the last line of error message is TypeError: __init__() got an unexpected keyword argument 'n_jobs'

halfbakedsneed commented 2 years ago

For those wondering, scikit-learn deprecated the n_jobs argument of sklearn.clusters.KMeans in 0.23 and removed it in 1.0 (https://scikit-learn.org/0.24/modules/generated/sklearn.cluster.KMeans.html?highlight=kmeans#sklearn.cluster.KMeans)

Because the dependencies of clust aren't pinned to a specific version, when you install clust your package manager will attempt to install the latest version of each (including sklearn, which has been moved to scikit-learn in PyPI and as of writing this, is now at version 1.0.1).

To remedy this locally, you can just downgrade scikit-learn to 0.24.2 (as long as you have nothing else depending on it in the same venv). This is the latest version of scikit-learn that still contains the n_jobs argument in the sklearn.clusters.KMeans method.

Run the equivalent of the following in your venv for whatever package manager you're using:

pip uninstall sklearn
pip install scikit-learn==0.24.2

Going forward, all dependencies here should be pinned: https://github.com/BaselAbujamous/clust/blob/master/setup.py#L86

diego-perojil commented 2 years ago

Awesome thanks a lot!

BaselAbujamous / clust

Getting error when executing clust #72