When comparing many small genomes, it is not possible to create individual files for each genome. For example, IMG/VR v4.1 contains 5,576,197 viral genomes. It is not possible to create this many files on most file systems, particularly in HPC environments where network file systems like NFS and Lustre are usually deployed.
For this situation, how about supplying a single FASTA file for all contigs, and a query.txt and reference.txt structured something like this?
When comparing many small genomes, it is not possible to create individual files for each genome. For example, IMG/VR v4.1 contains 5,576,197 viral genomes. It is not possible to create this many files on most file systems, particularly in HPC environments where network file systems like NFS and Lustre are usually deployed.
For this situation, how about supplying a single FASTA file for all contigs, and a
query.txt
andreference.txt
structured something like this?