MrOlm / drep

Rapid comparison and dereplication of genomes
247 stars 36 forks source link

ValueError: The number of observations cannot be determined on an empty distance matrix #104

Closed nick-youngblut closed 3 years ago

nick-youngblut commented 3 years ago

I'm running drep dereplicate on ~32000 genomes, and getting the following error:

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

Will filter the genome list
Loading genomes from a list
Calculating genome info of genomes
100.00% of genomes passed length filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

Running primary clustering
Running pair-wise MASH clustering
  Will split genomes into 7 groups for primary clustering
Traceback (most recent call last):
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/bin/dRep", line 32, in <module>
    Controller().parseArguments(args)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/controller.py", line 100, in parseArguments
    self.dereplicate_operation(**vars(args))
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/controller.py", line 48, in dereplicate_operation
    drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_workflows.py", line 37, in dereplicate_wrapper
    drep.d_cluster.controller.d_cluster_wrapper(wd, **kwargs)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/controller.py", line 179, in d_cluster_wrapper
    GenomeClusterController(workDirectory, **kwargs).main()
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/controller.py", line 32, in main
    self.run_primary_clustering()
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/controller.py", line 100, in run_primary_clustering
    Mdb, Cdb, cluster_ret = drep.d_cluster.compare_utils.all_vs_all_MASH(self.Bdb, self.wd.get_dir('MASH'), **self.kwargs)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/compare_utils.py", line 115, in all_vs_all_MASH
    Cdb, cluster_ret = cluster_mash_database(Mdb, **kwargs)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/compare_utils.py", line 276, in cluster_mash_database
    Cdb, linkage = drep.d_cluster.cluster_utils.cluster_hierarchical(linkage_db, linkage_method= P_Lmethod, \
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/cluster_utils.py", line 114, in cluster_hierarchical
    linkage = scipy.cluster.hierarchy.linkage(arr, method= linkage_method)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 1068, in linkage
    n = int(distance.num_obs_y(y))
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/scipy/spatial/distance.py", line 2425, in num_obs_y
    raise ValueError("The number of observations cannot be determined on "
ValueError: The number of observations cannot be determined on an empty distance matrix.

The specific command:

dRep dereplicate -sa 0.95 -p 16  --ignoreGenomeQuality  --S_algorithm fastANI  -g genome_paths.txt --  ./drep_output

Any idea why drep is generating an "empty distance matrix"?

My conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
biopython                 1.78             py38h497a2fe_1    conda-forge
boost                     1.70.0           py38h9de70de_1    conda-forge
boost-cpp                 1.70.0               h7b93d67_3    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h36c2ea0_0    conda-forge
ca-certificates           2021.1.19            h06a4308_0
capnproto                 0.6.1                hfc679d8_1    conda-forge
certifi                   2020.12.5        py38h578d9bd_1    conda-forge
checkm-genome             1.1.3                      py_1    bioconda
cycler                    0.10.0                     py_2    conda-forge
dendropy                  4.5.1              pyh3252c3a_0    bioconda
drep                      3.0.1                      py_0    bioconda
fastani                   1.32                 he1c1bb9_0    bioconda
freetype                  2.10.4               h0708190_1    conda-forge
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
gsl                       2.6                  he838d99_2    conda-forge
hmmer                     3.3.1                he1b5a44_0    bioconda
icu                       67.1                 he1b5a44_0    conda-forge
joblib                    1.0.0              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h516909a_0    conda-forge
kiwisolver                1.3.1            py38h1fd1430_1    conda-forge
krb5                      1.17.2               h926e7f8_0    conda-forge
lcms2                     2.11                 hcbb858e_1    conda-forge
ld_impl_linux-64          2.35.1               hed1e6ac_1    conda-forge
libblas                   3.9.0                7_openblas    conda-forge
libcblas                  3.9.0                7_openblas    conda-forge
libcurl                   7.71.1               hcdd3856_8    conda-forge
libdeflate                1.6                  h516909a_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgfortran-ng            9.3.0               hff62375_18    conda-forge
libgfortran5              9.3.0               hff62375_18    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
libidn2                   2.3.0                h516909a_0    conda-forge
liblapack                 3.9.0                7_openblas    conda-forge
libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libpng                    1.6.37               hed695b0_2    conda-forge
libssh2                   1.9.0                hab1572f_5    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
libtiff                   4.2.0                hdc55705_0    conda-forge
libunistring              0.9.10               h14c3975_0    conda-forge
libwebp-base              1.2.0                h7f98852_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_0    conda-forge
mash                      2.2.2                ha61e061_2    bioconda
matplotlib-base           3.3.4            py38h0efea84_0    conda-forge
mummer4                   4.0.0rc1        pl526he1b5a44_0    bioconda
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.19.5           py38h18fd61f_1    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openssl                   1.1.1i               h7f98852_0    conda-forge
pandas                    1.2.1            py38h51da96c_0    conda-forge
patsy                     0.5.1                      py_0    conda-forge
perl                      5.26.2            h36c2ea0_1008    conda-forge
pigz                      2.5                  h27826a3_0    conda-forge
pillow                    8.1.0            py38h357d4e7_1    conda-forge
pip                       21.0.1             pyhd8ed1ab_0    conda-forge
pplacer                   1.1.alpha19                   1    bioconda
prodigal                  2.6.3                h516909a_2    bioconda
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pysam                     0.16.0.1         py38hbdc2ae9_1    bioconda
python                    3.8.6           hffdb5ce_5_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.8                      1_cp38    conda-forge
pytz                      2020.5             pyhd8ed1ab_0    conda-forge
readline                  8.1                  h27cfd23_0
scikit-learn              0.24.1           py38h658cfdd_0    conda-forge
scipy                     1.6.0            py38hb2138dd_0    conda-forge
seaborn                   0.11.1               ha770c72_0    conda-forge
seaborn-base              0.11.1             pyhd8ed1ab_1    conda-forge
setuptools                52.0.0           py38h06a4308_0
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.34.0               h74cdb3f_0    conda-forge
statsmodels               0.12.1           py38h5c078b8_2    conda-forge
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
tk                        8.6.10               hed695b0_1    conda-forge
tornado                   6.1              py38h497a2fe_1    conda-forge
wget                      1.20.1               h22169c7_0    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.4.8                ha95c52a_1    conda-forge
nick-youngblut commented 3 years ago

I'm getting a different error when running the job locally:

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

Will filter the genome list
Loading genomes from a list
Calculating genome info of genomes
100.00% of genomes passed length filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

Running primary clustering
Running pair-wise MASH clustering
  Will split genomes into 7 groups for primary clustering
Traceback (most recent call last):
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/bin/dRep", line 32, in <module>
    Controller().parseArguments(args)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/controller.py", line 100, in parseArguments
    self.dereplicate_operation(**vars(args))
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/controller.py", line 48, in dereplicate_operation
    drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_workflows.py", line 37, in dereplicate_wrapper
    drep.d_cluster.controller.d_cluster_wrapper(wd, **kwargs)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/controller.py", line 179, in d_cluster_wrapper
    GenomeClusterController(workDirectory, **kwargs).main()
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/controller.py", line 32, in main
    self.run_primary_clustering()
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/controller.py", line 100, in run_primary_clustering
    Mdb, Cdb, cluster_ret = drep.d_cluster.compare_utils.all_vs_all_MASH(self.Bdb, self.wd.get_dir('MASH'), **self.kwargs)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/compare_utils.py", line 110, in all_vs_all_MASH
    genome_chunks = run_mash_on_genome_chunks(genome_chunks, mash_exe, sketch_folder, MASH_folder, logdir,  **kwargs)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/d_cluster/compare_utils.py", line 180, in run_mash_on_genome_chunks
    drep.thread_cmds(cmds, logdir=logdir, t=int(p))
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/__init__.py", line 56, in thread_cmds
    pool.map(thread_cmd_wrapper, tups)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/__init__.py", line 51, in thread_cmd_wrapper
    run_cmd(*tup)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/site-packages/drep/__init__.py", line 47, in run_cmd
    call(cmd,stdout=sto, stderr=ste)
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/subprocess.py", line 340, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/ebio/abt3_projects/Georg_animal_feces/bin/llg/.snakemake/conda/bb52dd38/bin/mash'

...which suggests that the empty distance matrix error was due to lack of memory for my cluster job.

It appears that drep is calling mash with all of the paths to all genomes, which seems to be too long for my ~32000 genomes.

I guess that I'm stuck using fastANI independently of drep

MrOlm commented 3 years ago

Hi Nick,

For 32,000 genomes you'll definitely need to add the argument --multiround_primary_clustering. This requires dRep v3 if you don't already have that installed. More info is on it here: https://drep.readthedocs.io/en/latest/choosing_parameters.html#using-greedy-algorithms

The first error may have been cause by running out of RAM, or it could be a problem with mash failing silently. If you try this again add the -d argument so that we can troubleshoot if it crashes again.

The second error (the local one) has to do with the command length limit for your bash setup. I'm not sure how to actually change the argument length limit for your bash profile, but lowering the deep argument --primary_chunksize could fix this problem if you hit it again. I've never had this problem with the default 5000, but lowering it to 3000 or so shouldn't result in any noticeable dip in performance.

-Matt

nick-youngblut commented 3 years ago

Thanks for the heads up on --multiround_primary_clustering and --primary_chunksize!

The first error was due to a lack of memory.

The cluster admin won't change the max command length. I'm guessing that you are using shorter file paths than me, which is why the 5000 default is working for you. This is a general problem for software that only allows nargs="+" instead of allowing one input file with a list of paths. Thanks for implementing the later in dRep!

Just one small thing in regards to python cli dev: many people don't know that argparse allows for dashes throughout a param (eg., --multiround-primary-clustering and --primary-chunksize), which is a bit easier to type. It's definitely personal preference though. I just thought I'd pass on that FYI, given that the developer is usually the one typing those params a ton (eg., during all of the software testing), so little things like params that are slightly easier to type can make a difference.

MrOlm commented 3 years ago

Cool, thanks for the heads up! I don't want to change those parameters as they stand, as I don't want to mess with workflows that others have implemented using the current flags, but I'll keep that in mind for the future

-Matt

Gian77 commented 1 year ago

Hello,

sorry if I am commenting on this again. I am having a very very similar problem here. I am trying to dereplicate across a series of MAGs and I am having the same "Empty Distance Matrix Error", please see below. I have tried both solution you suggested before, e.g. --multiround_primary_clustering and --primary_chunksize but does not seem to help.

    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

Will filter the genome list
346 genomes were input to dRep
Calculating genome info of genomes
98.27% of genomes passed length filtering
Running prodigal
Running checkM
0.29% of genomes passed checkM filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

Running primary clustering
Running pair-wise MASH clustering
Traceback (most recent call last):
  File "/mnt/home/benucci/anaconda2/envs/drep/bin/dRep", line 32, in <module>
    Controller().parseArguments(args)
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/controller.py", line 100, in parseArguments
    self.dereplicate_operation(**vars(args))
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/controller.py", line 48, in dereplicate_operation
    drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/d_workflows.py", line 37, in dereplicate_wrapper
    drep.d_cluster.controller.d_cluster_wrapper(wd, **kwargs)
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/d_cluster/controller.py", line 179, in d_cluster_wrapper
    GenomeClusterController(workDirectory, **kwargs).main()
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/d_cluster/controller.py", line 32, in main
    self.run_primary_clustering()
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/d_cluster/controller.py", line 100, in run_primary_clustering
    Mdb, Cdb, cluster_ret = drep.d_cluster.compare_utils.all_vs_all_MASH(self.Bdb, self.wd.get_dir('MASH'), **self.kwargs)
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/d_cluster/compare_utils.py", line 115, in all_vs_all_MASH
    Cdb, cluster_ret = cluster_mash_database(Mdb, **kwargs)
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/d_cluster/compare_utils.py", line 280, in cluster_mash_database
    Cdb, linkage = drep.d_cluster.cluster_utils.cluster_hierarchical(linkage_db, linkage_method= P_Lmethod, \
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/drep/d_cluster/cluster_utils.py", line 114, in cluster_hierarchical
    linkage = scipy.cluster.hierarchy.linkage(arr, method= linkage_method)
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1068, in linkage
    n = int(distance.num_obs_y(y))
  File "/mnt/home/benucci/anaconda2/envs/drep/lib/python3.9/site-packages/scipy/spatial/distance.py", line 2555, in num_obs_y
    raise ValueError("The number of observations cannot be determined on "
ValueError: The number of observations cannot be determined on an empty distance matrix.

This ia my conda env:

# packages in environment at /mnt/home/benucci/anaconda2/envs/drep:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
biopython                 1.78             py39h7f8727e_0    anaconda
blas                      1.0                         mkl  
brotli                    1.0.9                h5eee18b_7  
brotli-bin                1.0.9                h5eee18b_7  
bzip2                     1.0.8                h7b6447c_0  
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
checkm-genome             1.2.2              pyhdfd78af_1    bioconda
cycler                    0.11.0             pyhd3eb1b0_0  
dbus                      1.13.18              hb2f20db_0    anaconda
dendropy                  4.5.2              pyh3252c3a_0    bioconda
drep                      3.4.0              pyhdfd78af_0    bioconda
expat                     2.4.4                h295c915_0    anaconda
fastani                   1.33                 h0fdf51a_0    bioconda
fftw                      3.3.9                h27cfd23_1  
fontconfig                2.13.1               h6c09931_0    anaconda
fonttools                 4.25.0             pyhd3eb1b0_0  
freetype                  2.12.1               h4a9f257_0  
gettext                   0.21.1               h27087fc_0    conda-forge
giflib                    5.2.1                h7b6447c_0  
glib                      2.69.1               h4ff587b_1    anaconda
gsl                       2.7                  he838d99_0    conda-forge
gst-plugins-base          1.14.0               h8213a91_2    anaconda
gstreamer                 1.14.0               h28cd5cc_2    anaconda
hmmer                     3.3.2                h87f3376_2    bioconda
icu                       58.2                 he6710b0_3    anaconda
intel-openmp              2021.4.0          h06a4308_3561  
jbig                      2.1               h7f98852_2003    conda-forge
joblib                    1.1.0              pyhd3eb1b0_0    anaconda
jpeg                      9e                   h7f8727e_0  
kiwisolver                1.4.2            py39h295c915_0    anaconda
krb5                      1.19.2               hac12032_0    anaconda
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      3.0                  h295c915_0  
libblas                   3.9.0            12_linux64_mkl    conda-forge
libbrotlicommon           1.0.9                h5eee18b_7  
libbrotlidec              1.0.9                h5eee18b_7  
libbrotlienc              1.0.9                h5eee18b_7  
libcblas                  3.9.0            12_linux64_mkl    conda-forge
libclang                  10.0.1          default_hb85057a_2    anaconda
libcurl                   7.82.0               h7bff187_0    conda-forge
libdeflate                1.10                 h7f98852_0    conda-forge
libedit                   3.1.20210910         h7f8727e_0    anaconda
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.12               h8f2d780_0    anaconda
libffi                    3.3                  he6710b0_2  
libgcc                    7.2.0                h69d50b8_2  
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            11.2.0               h00389a5_1  
libgfortran5              11.2.0               h1234567_1  
libgomp                   12.2.0              h65d4601_19    conda-forge
libidn2                   2.3.4                h166bdaf_0    conda-forge
libllvm10                 10.0.1               hbcb73fb_5    anaconda
libnghttp2                1.47.0               h727a467_0    conda-forge
libnsl                    2.0.0                h5eee18b_0  
libpng                    1.6.37               hbc83047_0  
libpq                     12.9                 h16c4e8d_3    anaconda
libssh2                   1.10.0               haa6b8db_3    conda-forge
libstdcxx-ng              11.2.0               h1234567_1  
libtiff                   4.3.0                h542a066_3    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
libuuid                   1.41.5               h5eee18b_0  
libwebp                   1.2.4                h11a3e52_0  
libwebp-base              1.2.4                h5eee18b_0  
libxcb                    1.15                 h7f8727e_0    anaconda
libxkbcommon              1.0.1                hfa300c1_0    anaconda
libxml2                   2.9.14               h74e7548_0    anaconda
libxslt                   1.1.35               h4e12654_0    anaconda
libzlib                   1.2.13               h166bdaf_4    conda-forge
lz4-c                     1.9.3                h295c915_1  
mash                      1.1                           0    bioconda
matplotlib                3.5.1            py39h06a4308_1    anaconda
matplotlib-base           3.5.1            py39ha18d171_1    anaconda
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0            py39h7f8727e_0    anaconda
mkl_fft                   1.3.1            py39hd3c417c_0    anaconda
mkl_random                1.2.2            py39h51133e4_0    anaconda
mummer4                   4.0.0rc1        pl5321h87f3376_3    bioconda
munkres                   1.0.7                      py_1    bioconda
ncurses                   6.3                  h5eee18b_3  
nspr                      4.33                 h295c915_0    anaconda
nss                       3.74                 h0370c37_0    anaconda
numpy                     1.23.1           py39h6c91a56_0    anaconda
numpy-base                1.23.1           py39ha15fc14_0    anaconda
openssl                   1.1.1s               h0b41bf4_1    conda-forge
packaging                 21.3               pyhd3eb1b0_0  
pandas                    1.2.3            py39hde0f152_0    conda-forge
pcre                      8.45                 h295c915_0    anaconda
perl                      5.32.1          2_h7f98852_perl5    conda-forge
pillow                    9.2.0            py39hace64e9_1    anaconda
pip                       22.1.2           py39h06a4308_0    anaconda
ply                       3.11             py39h06a4308_0    anaconda
pplacer                   1.1.alpha19          h9ee0642_2    bioconda
prodigal                  2.6.3                hec16e2b_4    bioconda
pyparsing                 3.0.4              pyhd3eb1b0_0    anaconda
pyqt                      5.15.7           py39h6a678d5_1    anaconda
pyqt5-sip                 12.11.0          py39h6a678d5_1    anaconda
pysam                     0.19.0           py39h5030a8b_0    bioconda
python                    3.9.12               h12debd9_1    anaconda
python-dateutil           2.8.2              pyhd3eb1b0_0  
python_abi                3.9                      2_cp39    conda-forge
pytz                      2022.1           py39h06a4308_0    anaconda
qt-main                   5.15.2               h327a75a_7    anaconda
qt-webengine              5.15.9               hd2b0992_4    anaconda
qtwebkit                  5.212                h4eab89a_4    anaconda
readline                  8.2                  h5eee18b_0  
scikit-learn              1.1.1            py39h6a678d5_0    anaconda
scipy                     1.9.3            py39h14f4228_0  
seaborn                   0.11.2             pyhd3eb1b0_0    anaconda
setuptools                59.8.0           py39hf3d152e_1    conda-forge
sip                       6.6.2            py39h6a678d5_0    anaconda
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.39.3               h5082296_0  
threadpoolctl             2.2.0              pyh0d69192_0  
tk                        8.6.12               h1ccaba5_0  
toml                      0.10.2             pyhd3eb1b0_0    anaconda
tornado                   6.1              py39h27cfd23_0    anaconda
tzdata                    2022f                h04d1e81_0  
wget                      1.20.3               ha56f1ee_1    conda-forge
wheel                     0.37.1             pyhd3eb1b0_0  
xz                        5.2.6                h5eee18b_0  
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                ha4553b6_0 

These are dependencies:

(drep) [benucci@dev-amd20 code]$ dRep check_dependencies
mash.................................... all good        (location = /mnt/home/benucci/anaconda2/envs/drep/bin/mash)
nucmer.................................. all good        (location = /mnt/home/benucci/anaconda2/envs/drep/bin/nucmer)
checkm.................................. all good        (location = /mnt/home/benucci/anaconda2/envs/drep/bin/checkm)
ANIcalculator........................... !!! ERROR !!!   (location = None)
prodigal................................ all good        (location = /mnt/home/benucci/anaconda2/envs/drep/bin/prodigal)
centrifuge.............................. !!! ERROR !!!   (location = None)
nsimscan................................ !!! ERROR !!!   (location = None)
fastANI................................. all good        (location = /mnt/home/benucci/anaconda2/envs/drep/bin/fastANI)

And this is how I call it

dRep dereplicate \
    -p $cores \
    --multiround_primary_clustering \
    --primary_chunksize 3000 \
    $cait_scratch/c08_binsDereplication_drep \
    -g $cait_scratch/c07_aggregatedBins_dastool/dastool__DASTool_bins/*.fa 

Thanks a lot for your help!

Gian

MrOlm commented 1 year ago

Hi @Gian77 - the problem is that you're only having a single genome pass the checkM filtering. You probably need to relax the checkM filtering criteria.

Best, MO

Gian77 commented 1 year ago

@MrOlm, WTH, you're right. Sorry, I did not check the output carefully. Thank you, Gian