bcgsc / NanoSim

Nanopore sequence read simulator
Other
217 stars 51 forks source link

No module named `sklearn.neighbours.kde` #165

Open RagnarGrootKoerkamp opened 2 years ago

RagnarGrootKoerkamp commented 2 years ago

I did a fresh install of conda (via miniconda3) and installed nanosim.

When running, I get this error:

Traceback (most recent call last):
  File "/home/philae/.local/share/miniconda3/bin/simulator.py", line 2400, in <module>
    main()
  File "/home/philae/.local/share/miniconda3/bin/simulator.py", line 2161, in main
    read_profile(ref_g, number, model_prefix, perfect, args.mode, strandness, dna_type=dna_type, chimeric=chimeric)
  File "/home/philae/.local/share/miniconda3/bin/simulator.py", line 523, in read_profile
    kde_ht = joblib.load(model_prefix + "_ht_length.pkl")
  File "/home/philae/.local/share/miniconda3/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 587, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/philae/.local/share/miniconda3/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()
  File "/home/philae/.local/share/miniconda3/lib/python3.9/pickle.py", line 1212, in load
    dispatch[key[0]](self)
  File "/home/philae/.local/share/miniconda3/lib/python3.9/pickle.py", line 1528, in load_global
    klass = self.find_class(module, name)
  File "/home/philae/.local/share/miniconda3/lib/python3.9/pickle.py", line 1579, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.neighbors.kde'

It looks similar to https://github.com/bcgsc/NanoSim/issues/61, so I may be able so solve it, but anyway the package should work in a fresh install.

RagnarGrootKoerkamp commented 2 years ago

I got it working after doing

conda install pip
conda install cython
pip install scikit-learn=0.22.1

but like the linked issue I now get these depracation and incompatibility warnings:

/home/philae/.local/share/miniconda3/lib/python3.9/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.neighbors.kde module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.neighbors. Anything that cannot be imported from sklearn.neighbors is now part of the private API.
  warnings.warn(message, FutureWarning)
/home/philae/.local/share/miniconda3/lib/python3.9/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.neighbors.kd_tree module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.neighbors. Anything that cannot be imported from sklearn.neighbors is now part of the private API.
  warnings.warn(message, FutureWarning)
/home/philae/.local/share/miniconda3/lib/python3.9/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.neighbors.dist_metrics module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.neighbors. Anything that cannot be imported from sklearn.neighbors is now part of the private API.
  warnings.warn(message, FutureWarning)
/home/philae/.local/share/miniconda3/lib/python3.9/site-packages/sklearn/base.py:313: UserWarning: Trying to unpickle estimator KernelDensity from version 0.21.3 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(
SaberHQ commented 2 years ago

Hey Ragnar,

It is known issue with scikit-learn version incompatibility. Please refer to #131 for some tips from @kmnip and myself. If you install from bioconda, it is less likely to have installation issues. We think that requirements.txt is overly restrictive and it should be updated to avoid these issues. We will take care of that shortly. Thanks for your interest in using NanoSim. Cheers.

RagnarGrootKoerkamp commented 2 years ago

For completeness: As far as I'm aware (this is my first time using conda), I added the bioconda and conda-forge channels and then installed it, so that would mean this problem indeed also happens when installing from bioconda.

iferres commented 2 years ago

Hi, I'm having a the same issue and I noticed that even if I create an env with python3.7, the error keeps pointing to a directory named python3.9, inside the conda directory. The same as in your case https://github.com/bcgsc/NanoSim/issues/165#issuecomment-1105058924 . In my case: /opt/conda/envs/mms/lib/python3.9 Of course it doesn't exists, but /opt/conda/envs/mms/lib/python3.7 do. I was thinking if a hard coded path may be making NanoSim look for sklearn in a different python version that the one in which is actually installed.

When I launch python (3.7) sklearn.neighbors.kde (scikit-learn==0.23) is installed:

$ python
Python 3.7.0 | packaged by conda-forge | (default, Nov 12 2018, 20:15:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn.neighbors.kde
/opt/conda/envs/mms/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.neighbors.kde module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.neighbors. Anything that cannot be imported from sklearn.neighbors is now part of the private API.
  warnings.warn(message, FutureWarning)
>>> quit()

But when I run it I get:

Traceback (most recent call last):
  File "/opt/MMs/libs/../../NanoSim/src/simulator.py", line 2400, in <module>
    main()
  File "/opt/MMs/libs/../../NanoSim/src/simulator.py", line 2359, in main
    read_profile(genome_list, [], model_prefix, perfect, args.mode, strandness, dna_type=dna_type_list, abun=abun,
  File "/opt/MMs/libs/../../NanoSim/src/simulator.py", line 523, in read_profile
    kde_ht = joblib.load(model_prefix + "_ht_length.pkl")
  File "/opt/conda/envs/mms/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 587, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/opt/conda/envs/mms/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()
  File "/opt/conda/envs/mms/lib/python3.9/pickle.py", line 1212, in load
    dispatch[key[0]](self)
  File "/opt/conda/envs/mms/lib/python3.9/pickle.py", line 1528, in load_global
    klass = self.find_class(module, name)
  File "/opt/conda/envs/mms/lib/python3.9/pickle.py", line 1579, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.neighbors.kde'

I don't know, may be it says something to you.

PS. I'm installing and running everything in fresh singularity containers.

kmnip commented 2 years ago

The pretrained models in NanoSim were made using an older version of scikit-learn (e.g. <=0.22.1).

If you have to use these models (instead of creating your own models), then you must use scikit-learn=0.22.1 but not the newer versions. If you have a newer version of scikit-learn installed, then you will get the error for No module named 'sklearn.neighbors.kde'.

If you would like to create your own models (instead of using the pretrained models), then NanoSim should work just fine with scikit-learn=1.0.2 from my own experience.

On top of this incompatibility issue, some users also have difficulty with installing all the dependent packages with conda. I strongly recommend that you create a dedicated environment for running NanoSim. If you have issues with conda install being eternally stuck, use mamba instead of conda to install your conda packages: https://github.com/mamba-org/mamba .

So, integrating all these together:

conda create -n nanosim_pretrained
conda activate nanosim_pretrained

mamba install scikit-learn=0.22.1 six samtools pysam pybedtools minimap2 joblib htseq genometools-genometools

Note that here I only specified the version for scikit-learn but not for the other packages. mamba should be able to pick the appropriate versions for the specified packages, python, and numpy, etc.

Hope this helps whoever stumble upon this issue in the future!

wshuai294 commented 1 year ago

pip install scikit-learn==0.22.1 solved my problem.

HadrienG commented 1 year ago

Hi!

I also ran into this issue recently. Since the pretrained model require sckikit-learn <= 0.22.1, wouldn't it be adequate to pin this version in the bioconda recipe?

Best, Hadrien

kmnip commented 1 year ago

Hi @HadrienG ,

We will make a new release that includes updated pretrained models. For the existing models, this specific environment works for me:

requirements.txt

genometools-genometools
htseq=0.11.3
joblib=1.1.0
last
minimap2=2.17
numpy=1.21.5
pybedtools=0.8.1
pysam=0.15.3
samtools
scikit-learn=0.22.1
scipy=1.7.3
six=1.16.0
conda create -n nanosim
conda activate nanosim
mamba install --file requirements.txt -c conda-forge -c bioconda
dpryan79 commented 1 year ago

For those installing this via bioconda I've now patched the repodata to force scikit-learn >=0.20.0,<=0.22.1. That should hopefully resolve the issue there.