bcgsc / NanoSim

Nanopore sequence read simulator
Other
217 stars 51 forks source link

Issues with simulator.py and sklearn modules #120

Closed greenmna closed 2 years ago

greenmna commented 3 years ago

Hello,

I have been trying to run the simulator.py script for simulating a metagenomic dataset using the following code on a cluster.

/home/noah/NanoSim/src/simulator.py metagenome -gl /home/noah/metagenomics/mock-metagenome/mock-metagenomes/updated_mock_metagenome_list_for_simulation.tsv -a /home/noah/metagenomics/mock-metagenome/mock-metagenomes/updated_mock_metagenome_abundance_for_simulation_multi_sample.tsv -dl /home/noah/metagenomics/mock-metagenome/mock-metagenomes/updated_mock_metagenome_dna_type_list.tsv -c /home/noah/NanoSim/pre-trained_models/metagenome_ERR3152364_Even/training -t 12 -b guppy-flipflop

Upon running the program, I get this error that a module related to scikit-learn is not available, even though the tool is downloaded. The version I have for scikit-learn installed through conda is 0.24.2.

image

I believe the problem stems from the fact that scikit-learn changed their module notation to have an underscore after their script names. So now instead of sklearn.neighbors.kde it is sklearn.neighbors_.kde. I assume this is the case for any other modules that simulations for a genome or transcriptome may need from scikit-learn.

I did find a solution by simply downgrading my version of scikit-learn from 0.24.2 to 0.21.3. The script then ran with no issue!

Best regards!

Noah

cheny19 commented 3 years ago

Oh, thanks so much for letting us know! We are aware that certain packages are not compatible but never know it's a naming problem. I'll keep this issue open for future reference, and will wait for sometime to see if scikit-learn will change it back.

SaberHQ commented 3 years ago

Thanks for reporting this @greenmna

To confirm what you reported, I should say that I did actually encountered this "No module named sklearn.neighbors.kde error as well. I have sklearn version 0.24.1 installed on my conda environment. However, the weird thing is that after deactivating and activating my conda environment, I was able to run the code without any issue. I did not change anything or downgrade my sklearn version.

We will look into it and probably update the import functions to avoid such an error in future NanoSim releases.

lcoombe commented 2 years ago

Hi @SaberHQ and @cheny19 !

Just wanted to add that I got this error too with scikit-learn version 0.24.2+, but got it to work with 0.23.2. I do see that the module was deprecated in 0.22 and removed in 0.24 - so could be worth updating the import in a future release?

/projects/btl/lcoombe/miniconda3/envs/nanosim/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.neighbors.kde module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.neighbors. Anything that cannot be imported from sklearn.neighbors is now part of the private API.
SaberHQ commented 2 years ago

Dear @greenmna and @lcoombe, Please note that pull request #158 solves this issue by updating the scikit-learn version in requirements.txt

Previous sklearn.neighbors.kde has been renamed to sklearn.neighbors._kde in version 0.22.1. You have probably a version of scikit-learn older than that. Installing the latest release solves the problem:

pip install scikit-learn==0.22.1

For more information and help, please check this stackoverflow question/answer

I am closing this issue. If anyone finds a similar issue, please feel free to reopen it and we will be more than happy to help you. Thanks.