Closed cheberling closed 5 years ago
Hi!
Thanks for filing this issue! It is a good idea, although I think by default --genomes
should accept an arbitrary number of genomes while treating them as multifasta. iss
could have the behavior you are requesting with a --draft
flag replacing the --genomes
flag (or complementing it if you have a mix of draft and complete genomes)
Yes, I like your idea too! How difficult would it be to implement something like this?
It will require significant changes in how InSilicoSeq handles inpout genomes, but it is doable. I've started working on it in #72
new --draft
option in version 1.3.0
🎉
Great, thanks! I look forward to tying it out once I get a chance.
Currently, InSilicoSeq treats a draft genome assembly (having more than one contig) as a separate organism belonging to each contig (fasta record). Most bacterial genome assemblies contain more than one contig. By the looks of it, what I would have to do currently to use InSilicoSeq would be to concatenate all contigs from each genome together into one fasta record per genome, and put all of those into the same fasta file as a 'metagenome'. This requires extra preprocessing on the user's part and requires extra storage space too, and a new 'metagenome' file would have to be constructed for each run of the software. For very similar queries (metagenomes containing many of the same genomes) this would require a lot of extra storage space for redundant information.
Might it be possible to allow a separate command line argument for each genome, and therefore the original genome draft assemblies can be supplied to InSilicoSeq without having to construct new files? The --genomes option might look like this:
--genomes genome1.fasta [genome2.fasta] [genome3.fasta] ...
where the first genome is required and the rest are optional.