Closed valentynbez closed 4 months ago
Is your .fna
file*_virus.fna
from the _*summary
directory?
Yes, it is, I corrected the file path in the issue.
You're right, this shouldn't happen. Do you have a minimal example that I could use to reproduce the issue?
So here's another example, this time I ran command with neural network classification (the reported job was huge):
> zgrep ">" SAMEA7041137.contigs.min1000_summary/SAMEA7041137.contigs.min1000_virus.fna | wc -l
1564
> zgrep ">" SAMEA7041137.contigs.min1000_summary/SAMEA7041137.contigs.min1000_plasmid.fna | wc -l
2212
geNomad summary log:
[00:30:54] 2,207 plasmid(s) and 1,556 virus(es) were identified.
I cannot attach the contigs, because file is too big (120k contigs). Weirdly, everything is fine with a smaller input.
I couldn't reproduce the issue with the datasets that I have in handy.
Can you try counting the headers differently? Here are some options:
rg -c "^>" seq.fna
grep -c "^>" seq.fna
seqkit fx2tab -ni seq.fna | wc -l
How are your headers formatted?
Okay, found an issue. Completely unrelated to geNomad, sorry :/
No worries! Good that you figured it out
Hello,
Thanks for your tool. I have observed an inconsistency between the number of viral genomes reported in the log and in the output file. Where is this discrepancy coming from? Am I missing something?
Command:
Genomad summary log:
Number of sequences in summary file:
geNomad version: 1.7.4 Database version: 1.7