KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

No clusters can be found in the dataset #288

Closed ReneKat closed 2 years ago

ReneKat commented 2 years ago

Hello Autometa developers!

User checklist

Description

I'm getting the warning that no clusters can be found in the dataset, and for me to check the input contigs file. Here's a quick description of my contigs file: n_contigs 1327209
contig_bp 5973946762 gap_pct 0.000 ctg_N50 288890 ctg_L50 4806 ctg_N90 143144
ctg_L90. 7131
ctg_max 827479
gc_avg 0.49392 gc_std. 0.12321

...

Expected Behavior

Produce clusers from dataset

System Environment

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
asttokens 2.0.5 pyhd8ed1ab_0 conda-forge
attrs 21.4.0 pyhd8ed1ab_0 conda-forge
autometa 2.1.0 pyh5e36f6f_0 bioconda
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
beautifulsoup4 4.11.1 pyha770c72_0 conda-forge
bedtools 2.30.0 h468198e_3 bioconda
biopython 1.79 py39hb9d737c_2 conda-forge
boost-cpp 1.74.0 h75c5d50_8 conda-forge
bowtie2 2.2.5 py39h7cff6ad_8 bioconda
brotli 1.0.9 h166bdaf_7 conda-forge
brotli-bin 1.0.9 h166bdaf_7 conda-forge
brotlipy 0.7.0 py39hb9d737c_1004 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2022.6.15 ha878542_0 conda-forge
cachecontrol 0.12.11 pyhd8ed1ab_0 conda-forge
certifi 2022.6.15 py39hf3d152e_0 conda-forge
cffi 1.15.1 py39he91dace_0 conda-forge
charset-normalizer 2.1.0 pyhd8ed1ab_0 conda-forge
colorama 0.4.5 pyhd8ed1ab_0 conda-forge
cryptography 37.0.4 py39hd97740a_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
cython 0.29.30 py39h5a03fae_0 conda-forge
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
diamond 2.0.15 hb97b32f_0 bioconda
executing 0.8.3 pyhd8ed1ab_0 conda-forge
filelock 3.7.1 pyhd8ed1ab_0 conda-forge
fonttools 4.34.4 py39hb9d737c_0 conda-forge
freetype 2.10.4 h0708190_1 conda-forge
gdown 4.4.0 pyhd8ed1ab_0 conda-forge
giflib 5.2.1 h36c2ea0_2 conda-forge
hdbscan 0.8.28 py39hce5d2b2_1 conda-forge
hdmedians 0.14.2 py39hd257fcd_2 conda-forge
hmmer 3.3.2 h87f3376_2 bioconda
htslib 1.11 hd3b49d5_2 bioconda
icu 70.1 h27087fc_0 conda-forge
idna 3.3 pyhd8ed1ab_0 conda-forge iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge ipython 8.4.0 py39hf3d152e_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.18.1 py39hf3d152e_1 conda-forge joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9e h166bdaf_2 conda-forge kiwisolver 1.4.3 py39hf939315_0 conda-forge krb5 1.17.2 h926e7f8_0 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge lerc 2.2.1 h9c3ff4c_0 conda-forge libblas 3.9.0 15_linux64_openblas conda-forge libbrotlicommon 1.0.9 h166bdaf_7 conda-forge libbrotlidec 1.0.9 h166bdaf_7 conda-forge libbrotlienc 1.0.9 h166bdaf_7 conda-forge libcblas 3.9.0 15_linux64_openblas conda-forge libcurl 7.71.1 hcdd3856_3 conda-forge libdeflate 1.7 h7f98852_5 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.1.0 h8d9b700_16 conda-forge libgfortran-ng 12.1.0 h69a702a_16 conda-forge libgfortran5 12.1.0 hdcd56e2_16 conda-forge libgomp 12.1.0 h8d9b700_16 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 15_linux64_openblas conda-forge libllvm11 11.1.0 hf817b99_3 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.20 pthreads_h78a6416_0 conda-forge libpng 1.6.37 h753d276_3 conda-forge libssh2 1.10.0 ha56f1ee_2 conda-forge libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge libtiff 4.3.0 hf544144_1 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp 1.2.2 h3452ae3_0 conda-forge libwebp-base 1.2.2 h7f98852_1 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libzlib 1.2.12 h166bdaf_1 conda-forge llvmlite 0.38.1 py39h7d9a04d_0 conda-forge lockfile 0.12.2 py_1 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge matplotlib-base 3.5.2 py39h700656a_0 conda-forge matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge msgpack-python 1.0.4 py39hf939315_0 conda-forge munkres 1.0.7 py_1 bioconda natsort 8.1.0 pyhd8ed1ab_0 conda-forge ncurses 6.2 h58526e2_4 conda-forge numba 0.55.2 py39h66db6d7_0 conda-forge numpy 1.22.4 py39hc58783e_0 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1q h166bdaf_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.4.3 py39h1832856_0 conda-forge parallel 20160622 1 bioconda parso 0.8.3 pyhd8ed1ab_0 conda-forge perl 5.32.1 2_h7f98852_perl5 conda-forge perl-threaded 5.32.1 hdfd78af_1 bioconda pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 9.1.1 py39hae2aec6_1 conda-forge pip 22.1.2 pyhd8ed1ab_0 conda-forge pluggy 1.0.0 py39hf3d152e_3 conda-forge popt 1.16 1 bioconda prodigal 2.6.3 hec16e2b_4 bioconda prompt-toolkit 3.0.30 pyha770c72_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge py 1.11.0 pyh6c4a22f_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pygments 2.12.0 pyhd8ed1ab_0 conda-forge pynndescent 0.5.7 pyh6c4a22f_0 conda-forge pyopenssl 22.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 py39hf3d152e_5 conda-forge pytest 7.1.2 py39hf3d152e_0 conda-forge python 3.9.9 h62f1059_0_cpython conda-forge python-annoy 1.17.0 py39h5a03fae_4 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.9 2_cp39 conda-forge pytz 2022.1 pyhd8ed1ab_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge requests 2.28.1 pyhd8ed1ab_0 conda-forge rsync 3.2.3 hfa40b15_4 conda-forge samtools 1.11 h6270b1f_0 bioconda scikit-bio 0.5.6 py39h16ac069_4 conda-forge scikit-learn 0.24.0 py39h4dfa638_0 conda-forge scipy 1.8.1 py39he49c0e8_0 conda-forge setuptools 63.1.0 py39hf3d152e_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge soupsieve 2.3.1 pyhd8ed1ab_0 conda-forge sqlite 3.37.0 h9cd32fc_0 conda-forge stack_data 0.3.0 pyhd8ed1ab_0 conda-forge tbb 2021.5.0 h924138e_1 conda-forge threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge tk 8.6.12 h27826a3_0 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge tqdm 4.64.0 pyhd8ed1ab_0 conda-forge traitlets 5.3.0 pyhd8ed1ab_0 conda-forge trimap 1.0.15 pyh5e36f6f_0 bioconda tsne 0.3.1 py39hcb82e07_3 conda-forge tzdata 2022a h191b570_0 conda-forge umap-learn 0.5.3 py39hf3d152e_0 conda-forge unicodedata2 14.0.0 py39hb9d737c_1 conda-forge urllib3 1.26.10 pyhd8ed1ab_0 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xxhash 0.8.0 h7f98852_3 conda-forge xz 5.2.5 h516909a_1 conda-forge zlib 1.2.12 h166bdaf_1 conda-forge zstd 1.5.2 h8a70e8d_2 conda-forge

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian Address sizes: 43 bits physical, 48 bits virtual CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7352 24-Core Processor Stepping: 0 Frequency boost: enabled CPU MHz: 1500.000 CPU max MHz: 2300.0000 CPU min MHz: 1500.0000

Tasks/Command(s)

bash autometa.sh

[08/19/2022 11:06:59 PM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: species : uncultured_marine_thaumarchaeote_ad1000_89_f09 : (1, 15)       
[08/19/2022 11:06:59 PM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: species : uncultured_marine_thaumarchaeote_km3_66_e12 : (1, 15)          
[08/19/2022 11:06:59 PM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: species : uncultured_marine_thaumarchaeote_km3_85_e11 : (1, 15)          
[08/19/2022 11:06:59 PM WARNING] root: Failed to recover any clusters from dataset                                                                             
[08/19/2022 11:06:59 PM WARNING] root: This may be due to too few input contigs, too few marker-containing contigs, or some other reason!                      
[08/19/2022 11:06:59 PM WARNING] root: Please inspect your input data before proceeding with this dataset!                                                     
/home/fch/miniconda3/envs/autometa/lib/python3.9/site-packages/autometa/binning/summary.py:358: DtypeWarning: Columns (1) have mixed types. Specify dtype optio
n on import or set low_memory=False.                                                                                                                           
  bin_df = pd.read_csv(args.binning_main, sep="\t", index_col="contig")                                                                                        
[08/19/2022 11:08:05 PM INFO] autometa.binning.summary: Retrieving metabins' stats for cluster                                                                 
[08/19/2022 11:08:09 PM INFO] root: Wrote metabin stats to autometa/FCH03_bacteria_metabin_stats.tsv                                                           
[08/19/2022 11:08:16 PM INFO] autometa.binning.summary: Retrieving metabin taxonomies for cluster                                                              
[08/19/2022 11:08:16 PM DEBUG] root: Ranking taxids                                                                                                            
autometa/FCH03.archaea.hdbscan.main.tsv does not exist, skipping...     

Sidduppal commented 2 years ago

Hey @ReneKat can you please post the commands that you're using along with the complete log?

ReneKat commented 2 years ago

Hi @Sidduppal , Thanks for getting back to me so quickly, I'm following the bash tutorial, so have just run:

bash autometa.sh

As for as the log file, I do not see one in the autometa output directory. Where does it get written to?

Here's a list of the files in the autometa output dir:

292K FCH03.archaea.5mers.am_clr.bhsne.tsv 51M FCH03.archaea.5mers.am_clr.tsv 6.0M FCH03.archaea.5mers.tsv 29M FCH03.archaea.fna 32M FCH03.archaea.hmmscan.tsv 13M FCH03.archaea.markers.tsv 9.7M FCH03.bacteria.5mers.am_clr.bhsne.tsv 1.7G FCH03.bacteria.5mers.am_clr.tsv 201M FCH03.bacteria.5mers.tsv 1.1G FCH03.bacteria.fna 24M FCH03.bacteria.hdbscan.main.tsv 4.1M FCH03.bacteria.hdbscan.tsv 44M FCH03.bacteria.hmmscan.tsv 19M FCH03.bacteria.markers.tsv 936M FCH03_bacteria_metabins 32K FCH03_bacteria_metabin_stats.tsv 32K FCH03_bacteria_metabin_taxonomy.tsv 5.5G FCH03.coverages.bed.tsv 62M FCH03.coverages.tsv 843M FCH03.eukaryota.fna 4.2G FCH03.filtered.fna 27M FCH03.gc_content.tsv 4.0K FCH03.orfs.errortaxids.tsv 180M FCH03.orfs.lca.tsv 1.6G FCH03.orfs.sseqid2taxid.tsv 4.0K FCH03.stats.tsv 26M FCH03.taxids.tsv 133M FCH03.taxonomy.tsv 2.9G FCH03.unclassified.fna 497M FCH03.viruses.fna

Thanks, Rene

Sidduppal commented 2 years ago

Hey @ReneKat it looks like Autometa identified bins for bacterial contigs, they can be found in FH03_bacteria_metabin_stats.tsv and FCH03_bacteria_metabin_taxonomy.tsv. The warning seems to be only for archaeal contigs, where Autometa could not find any bins. Unless you are interested in archaea you can safely ignore this warning.

ReneKat commented 2 years ago

Okay great! So it finished successfully then. I wasn't sure since the last line was just '...skipping...' Thanks again for getting back to me so quickly. Have a nice weeknd, Rene