Open F4NG666 opened 2 months ago
The summary files should only include sequences classified as viruses (<prefix>_virus_summary.tsv
) or plasmids (<prefix>_plasmid_summary.tsv
). Sequences not present in the summary were either not classified as viruses or plasmids, or they were classified but didn't pass the post-classification filters. These filters can be disabled by using the --relaxed
flag.
The taxonomy file only contains sequences that were assigned to a taxon. Sequences missing from this file did not match any taxonomically-informative markers. If you expected all sequences to match a marker, you can try increasing the search sensitivity (e.g., -s 7
), but this will increase execution time and memory usage.
Hi,
I hope this message finds you well.
I am currently using Genomad for analyzing a dataset of 39,910 sequences. However, I’ve noticed discrepancies in the output files that I need clarification on:
The summary file contains only 38,449 rows. The taxonomy file generated by the annotation module contains 39,888 rows. Could you please help me understand why there is a difference in the number of rows between the input sequences and these output files? Specifically, I would like to know where and why the sequences might have been removed or filtered out.
Thank you for your assistance!
Best regards, Fang