AnantharamanLab / VIBRANT

Virus Identification By iteRative ANnoTation
GNU General Public License v3.0
149 stars 37 forks source link

contig size problem #66

Closed Leytoncito closed 2 years ago

Leytoncito commented 2 years ago

Hello, Running the following command,

for file in ../genomes/*.fasta; do VIBRANT_run.py -d /home/bioren/bleyton/.conda/envs/vibrant/share/vibrant-1.2.1/db/databases/ -folder ./ -t 16 -i $file; done

I got this error:

Fatal exception (source file p7_pipeline.c, line 697): Target sequence length > 100K, over comparison pipeline limit. (Did you mean to use nhmmer/nhmmscan?)

I think the error has nothing to do with VIBRANT, even so this happens with genomes of 1 contigs.

This got me wondering:

1)Is there a limitation of these tools to find phages in contigs larger than 100k? 2)Is it correct to use VIBRANT with long contig genomes? 3)The phage prediction by VIBRANT is still working, ergo this error does not stop the execution of VIBRANT. Why do you think this happens?

Thank you very much in advance for your answers, they will help me to make a better sense of the results.

Benjamin

KrisKieft commented 2 years ago

Hi,

This is an odd issue that I've not seen before and I'm unsure what's causing it. I've routinely used VIBRANT on whole bacterial genomes that are 1-5 Mbp and there certainly isn't a genomic size limit. You can use VIBRANT with whole microbial genomes or with long read contigs. Maybe one of your genomes has a strange ORF prediction pattern that's causing a single protein to be huge? Are you able to find the length of each predicted ORF and check if there are any that are exceptionally large (>100k amino acids)?

Leytoncito commented 2 years ago

Thanks for the reply, there was indeed an extremely long translated protein.