ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
374 stars 67 forks source link

Multi-genome reference and query FASTAs for many-to-many queries #123

Open ryneches opened 1 year ago

ryneches commented 1 year ago

When comparing many small genomes, it is not possible to create individual files for each genome. For example, IMG/VR v4.1 contains 5,576,197 viral genomes. It is not possible to create this many files on most file systems, particularly in HPC environments where network file systems like NFS and Lustre are usually deployed.

For this situation, how about supplying a single FASTA file for all contigs, and a query.txt and reference.txt structured something like this?

{genome_name}\t{contig_1},{contig_2},{contig_3}....\n