cmks / DAS_Tool

DAS Tool
Other
140 stars 17 forks source link

Contigs of contig2bin files not found in assembly #92

Closed ntromas closed 11 months ago

ntromas commented 1 year ago

Hi,

I am trying to use DAS_Tool with MAGs from Vamb and Metabat2 and I got that issue. I am able to find "196671" in the contig and the 2 contig2bin files. I am probably missing something but not sure what (bins name or contigs header issue?).

Thanks for your help!

Analyzing assembly Error: Contigs of contig2bin files not found in assembly: 196671 S43C99.25917NC29776, S43C99.25917NC398043, S43C99.25917NC475611, S43C99.25917NC594844, S43C99.25917NC193533,...

[usernt@narval4 VB]$ grep -A 1 "196671" all_VB_megahit_fin.fa

S8C85.18227NC196671 TCACATGGGACTACACGTTGGCTCGTCCGGCATTGCGCAAGTTGTATGAAAAGGCAAAAA

S13C25.18252NC196671 TTCCTATGTCATGCAAGAGAGCAGCTAGGCGAATGACAAGGTTTGGTCCACCTAAGCGAT

S24C85.18271NC196671 TCTTGAGTTGATCAGCGAGGCTTGCAAGGGACTGCCCGTGCCGTTGGTGGCCATCGGAGG

S28C117.25907NC196671 AGAGCAGTTCCGAATGTAGCATTTGTAGCAACTGCATCTGTTGATGATACTAACCCAGCA

[usernt@narval4 VB]$ grep -A 1 "196671" OUT_vamb/bins/my_contigs2bin_vamb.tsv S13C25.18252NC196671 S13C25.18252NC1431 S13C25.18252NC182258 S13C25.18252NC1431

S28C117.25907NC196671 S28C117.25907NC788 S28C117.25907NC271831 S28C117.25907NC788

S8C85.18227NC196671 S8C85.18227NC22 S8C85.18227NC79499 S8C85.18227NC22

[usernt@narval4 VB]$ grep -A 1 "196671" OUT_metabat2/my_contigs2bin_metabat.tsv S28C117.25907NC196671 OUT_metabat2.1012 S28C117.25907NC458606 OUT_metabat2.1012

S8C85.18227NC196671 OUT_metabat2.289 S8C85.18227NC6577 OUT_metabat2.289

S13C25.18252NC196671 OUT_metabat2.456 S13C25.18252NC216889 OUT_metabat2.456

Command used: DAS_Tool-1.1.6/DAS_Tool -i OUT_metabat2/my_contigs2bin_metabat.tsv,OUT_vamb/bins/my_contigs2bin_vamb.tsv -l metabat2,vamb -c all_VB_megahit_fin.fa -o DAS_Run/ --threads 32 --score_threshold 0.6 --search_engine diamond --write_bin_evals --write_bins

cmks commented 1 year ago

Hi @ntromas, I guess the error message can be misleading. What it actually says is that a total of 196671 contigs in the contig2bin file were not found in the assembly fasta. Example are the listed contig names below: S43C99.25917NC29776, S43C99.25917NC398043, S43C99.25917NC475611, S43C99.25917NC594844, S43C99.25917NC193533.

To check try the following:

grep -A 1 "S43C99.25917NC29776" all_VB_megahit_fin.fa
ntromas commented 1 year ago

Oh ok! Hum... I should have look at the error a bit more carefully ... Will try to see what happened as I used the same coassembly... Thanks!