CAMI-challenge / CAMISIM

CAMISIM: Simulating metagenomes and microbial communities
https://data.cami-challenge.org/participate
Apache License 2.0
169 stars 37 forks source link

No module named 'sklearn.neighbors.kde' #138

Closed sternp closed 2 years ago

sternp commented 2 years ago

I'm trying to run a Nanopore simulation however I haven't been able to fix the following error regarding sklearn.neighbors.kde

2022-07-08 13:39:17 INFO: [MetagenomeSimulationPipeline] Metagenome simulation starting
2022-07-08 13:39:17 INFO: [MetagenomeSimulationPipeline] Validating Genomes
2022-07-08 13:39:17 INFO: [MetadataReader] Reading file: '/home/sternesp/microbiome/users/sternesp/camisim/CAMISIM/defaults/genome_to_id-test.tsv'
2022-07-08 13:39:20 INFO: [MetagenomeSimulationPipeline] Design Communities
2022-07-08 13:39:20 INFO: [CommunityDesign] Drawing strains.
2022-07-08 13:39:20 INFO: [MetadataReader 93881117321] Reading file: '/home/sternesp/microbiome/users/sternesp/camisim/CAMISIM/defaults/metadata-test.tsv'
2022-07-08 13:39:20 INFO: [MetadataReader 37354078098] Reading file: '/home/sternesp/microbiome/users/sternesp/camisim/CAMISIM/defaults/gff_to_id.tsv'
2022-07-08 13:39:20 INFO: [MetadataReader 79339126686] Reading file: '/home/sternesp/microbiome/users/sternesp/camisim/CAMISIM/defaults/genome_to_id-test.tsv'
2022-07-08 13:39:20 INFO: [CommunityDesign] Validating raw sequence files!
2022-07-08 13:39:22 INFO: [NcbiTaxonomy] Building taxonomy tree...
2022-07-08 13:39:22 INFO: [NcbiTaxonomy] Reading 'nodes' file:  '/tmp/tmpt54_a5gq/NCBI/nodes.dmp'
2022-07-08 13:39:31 INFO: [NcbiTaxonomy] Reading 'names' file:  '/tmp/tmpt54_a5gq/NCBI/names.dmp'
2022-07-08 13:39:32 INFO: [NcbiTaxonomy] Reading 'merged' file: '/tmp/tmpt54_a5gq/NCBI/merged.dmp'
2022-07-08 13:39:32 INFO: [NcbiTaxonomy] Done (10s)
2022-07-08 13:39:32 INFO: [MetagenomeSimulationPipeline] Move Genomes
2022-07-08 13:39:32 WARNING: [GenomePreparation 11624708560] File /home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/source_genomes/GCA_000834435.1_ASM83443v1_genomic.fna existing, skipping
2022-07-08 13:39:32 WARNING: [GenomePreparation 11624708560] File /home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/source_genomes/GCA_000981485.1_EcoliK12AG100_genomic.fna existing, skipping
2022-07-08 13:39:32 INFO: [MetagenomeSimulationPipeline] Read simulation
2022-07-08 13:39:32 INFO: [GenomePreparation 10358723180] Reading distribution file
2022-07-08 13:39:32 INFO: [GenomePreparation 10358723180] Reading genome location file
2022-07-08 13:39:32 INFO: [GenomePreparation 10358723180] Simulating reads using ReadSimulationNanosim readsimulator...
2022-07-08 13:39:32 INFO: [GenomePreparation 10358723180] Simulating reads from GCA_000834435.1_ASM83443v1_genomic.fna: '/home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/source_genomes/GCA_000834435.1_ASM83443v1_genomic.fna'
2022-07-08 13:39:32 INFO: [GenomePreparation 10358723180] Simulating reads from GCA_000981485.1_EcoliK12AG100_genomic.fna: '/home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/source_genomes/GCA_000981485.1_EcoliK12AG100_genomic.fna'

running the code with following parameters:

ref_g /home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/source_genomes/GCA_000834435.1_ASM83443v1_genomic.fna
model_prefix tools/nanosim_profile/ecoli
out /tmp/tmp54bl7pro/2022.07.08_13.39.17_sample_0/reads/GCA_000834435.1_ASM83443v1_genomic.fna
number [326926]
perfect False
kmer_bias None
basecaller None
dna_type linear
strandness None
sd_len None
median_len None
max_len inf
min_len 50
fastq False
chimeric False
num_threads 1
2022-07-08 13:39:34: /work/microbiome/users/sternesp/conda/envs/camisim/bin/simulator.py genome -n 326926 -r /home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/source_genomes/GCA_000834435.1_ASM83443v1_genomic.fna -o /tmp/tmp54bl7pro/2022.07.08_13.39.17_sample_0/reads/GCA_000834435.1_ASM83443v1_genomic.fna -c tools/nanosim_profile/ecoli --seed 1333821284 -dna_type linear
2022-07-08 13:39:34: Read in reference 
2022-07-08 13:39:34: Read error profile
2022-07-08 13:39:34: Read KDF of unaligned reads
Traceback (most recent call last):
  File "/work/microbiome/users/sternesp/conda/envs/camisim/bin/simulator.py", line 2319, in <module>
    main()
  File "/work/microbiome/users/sternesp/conda/envs/camisim/bin/simulator.py", line 2088, in main
    read_profile(ref_g, number, model_prefix, perfect, args.mode, strandness, dna_type=dna_type, chimeric=chimeric)
  File "/work/microbiome/users/sternesp/conda/envs/camisim/bin/simulator.py", line 510, in read_profile
    kde_unaligned = joblib.load(model_prefix + "_unaligned_length.pkl")
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 588, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 507, in _unpickle
    obj = unpickler.load()
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/pickle.py", line 1529, in load_global
    klass = self.find_class(module, name)
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/pickle.py", line 1580, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.neighbors.kde'

running the code with following parameters:

ref_g /home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/source_genomes/GCA_000981485.1_EcoliK12AG100_genomic.fna
model_prefix tools/nanosim_profile/ecoli
out /tmp/tmp54bl7pro/2022.07.08_13.39.17_sample_0/reads/GCA_000981485.1_EcoliK12AG100_genomic.fna
number [348020]
perfect False
kmer_bias None
basecaller None
dna_type linear
strandness None
sd_len None
median_len None
max_len inf
min_len 50
fastq False
chimeric False
num_threads 1
2022-07-08 13:39:35: /work/microbiome/users/sternesp/conda/envs/camisim/bin/simulator.py genome -n 348020 -r /home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/source_genomes/GCA_000981485.1_EcoliK12AG100_genomic.fna -o /tmp/tmp54bl7pro/2022.07.08_13.39.17_sample_0/reads/GCA_000981485.1_EcoliK12AG100_genomic.fna -c tools/nanosim_profile/ecoli --seed 2310065096 -dna_type linear
2022-07-08 13:39:35: Read in reference 
2022-07-08 13:39:35: Read error profile
2022-07-08 13:39:35: Read KDF of unaligned reads
Traceback (most recent call last):
  File "/work/microbiome/users/sternesp/conda/envs/camisim/bin/simulator.py", line 2319, in <module>
    main()
  File "/work/microbiome/users/sternesp/conda/envs/camisim/bin/simulator.py", line 2088, in main
    read_profile(ref_g, number, model_prefix, perfect, args.mode, strandness, dna_type=dna_type, chimeric=chimeric)
  File "/work/microbiome/users/sternesp/conda/envs/camisim/bin/simulator.py", line 510, in read_profile
    kde_unaligned = joblib.load(model_prefix + "_unaligned_length.pkl")
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 588, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 507, in _unpickle
    obj = unpickler.load()
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/pickle.py", line 1529, in load_global
    klass = self.find_class(module, name)
  File "/work/microbiome/users/sternesp/conda/envs/camisim/lib/python3.9/pickle.py", line 1580, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.neighbors.kde'
2022-07-08 13:39:36 INFO: [GenomePreparation 10358723180] Simulating reads finished
2022-07-08 13:39:36 INFO: [MetagenomeSimulationPipeline] Generate gold standard assembly
2022-07-08 13:39:36 INFO: [MetadataReader 63661585664] Reading file: '/home/sternesp/microbiome/users/sternesp/camisim/long_read_simulations/sample01-test/internal/genome_locations.tsv'
2022-07-08 13:39:36 ERROR: [MetagenomeSimulationPipeline] Empty bam file list in line 106
2022-07-08 13:39:36 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted

Are you able to provide any insight into this?

Thanks!

AlphaSquad commented 2 years ago

Hi, thank you for your interest in CAMISIM! Unfortunately, this is a known error in Nanosim, you probably have a newer version of scikit-learn than the version with which the Nanosim error models were trained, you can find possible solutions for this problem here: https://github.com/bcgsc/NanoSim/issues/165

sternp commented 2 years ago

Thanks. It seems to get a bit further in the analysis now (albeit throwing many deprecation warnings). However now it's giving this:

FileNotFoundError: [Errno 2] No such file or directory: 'tools/nanosim_profile/ecoli_ht_length.pkl'

I can't track down this file anywhere. Is it generated by CAMISIM itself?

AlphaSquad commented 2 years ago

Oh, that actually might be due to an update that I pushed just Friday, did you update your repository since then?

Indeed, these files are internally - CAMISIM provides Nanosim models for two different versions of Nanosim (1 and 3). Nanosim3 uses the pkl files while the old Nanosim1 has different files. Something went wrong for you, since ecoli is the model we provide for Nanosim1, but the file ending (pkl) suggests that you are using Nanosim3 and the model CAMISIM should use is training, i.e. the file tools/nanosim_profile/training_ht_length.pkl. If you updated your repository, you will need to add training as the profile in the config file, there is a new nanosim3_config.ini in the defaults folder for reference.

The deprecation warnings probably reference the sklearn.neighbors.kde package? Unfortunately this cannot be avoided when reading the length pkl file since it was created (by Nanosim) with an old scikit version.

sternp commented 2 years ago

Great - thanks for your help! Those fixes worked.