apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Inconsistency in number of genomes reported by genomad #75

Closed valentynbez closed 4 months ago

valentynbez commented 4 months ago

Hello,

Thanks for your tool. I have observed an inconsistency between the number of viral genomes reported in the log and in the output file. Where is this discrepancy coming from? Am I missing something?

Command:

genomad end-to-end --threads 4 --disable-nn-classification \
    contigs.fa \
    contigs.genomad \
    genomad_db/

Genomad summary log:

[14:36:23] 640,769 plasmid(s) and 487,809 virus(es) were identified.  

Number of sequences in summary file:

$ grep ">" contigs.genomad/contigs.genomad_summary/contig_virus.fna | wc -l
276159

geNomad version: 1.7.4 Database version: 1.7

apcamargo commented 4 months ago

Is your .fna file*_virus.fna from the _*summary directory?

valentynbez commented 4 months ago

Yes, it is, I corrected the file path in the issue.

apcamargo commented 4 months ago

You're right, this shouldn't happen. Do you have a minimal example that I could use to reproduce the issue?

valentynbez commented 4 months ago

So here's another example, this time I ran command with neural network classification (the reported job was huge):

> zgrep ">" SAMEA7041137.contigs.min1000_summary/SAMEA7041137.contigs.min1000_virus.fna | wc -l
1564
> zgrep ">" SAMEA7041137.contigs.min1000_summary/SAMEA7041137.contigs.min1000_plasmid.fna | wc -l
2212

geNomad summary log:

[00:30:54] 2,207 plasmid(s) and 1,556 virus(es) were identified.

I cannot attach the contigs, because file is too big (120k contigs). Weirdly, everything is fine with a smaller input.

apcamargo commented 4 months ago

I couldn't reproduce the issue with the datasets that I have in handy.

Can you try counting the headers differently? Here are some options:

rg -c "^>" seq.fna
grep -c "^>" seq.fna
seqkit fx2tab -ni seq.fna | wc -l

How are your headers formatted?

valentynbez commented 4 months ago

Okay, found an issue. Completely unrelated to geNomad, sorry :/

apcamargo commented 4 months ago

No worries! Good that you figured it out