HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
176 stars 32 forks source link

Multi-fasta subset simulation #183

Closed Shahab-Sa closed 3 years ago

Shahab-Sa commented 4 years ago

Hi Hadrien, I have a problem in simulating a subset of genomes present in multi-fasta using iss version 1.5.1 Imagine the multi-fasta file contains three genomes A,B,C. If the abundance file contains A and B and their abundances, simulation fails with both of these commands; iss generate --genomes multi.fasta --model miseq --output test --n_reads 1K --abundance_file AB.tsv iss generate --genomes multi.fasta --model miseq --output test --n_reads 1K --abundance_file AB.tsv --n_genomes 2

ERROR:iss.app:Fasta record not found in abundance file: 'NC_022663.1'(which is Genome C)

HadrienG commented 3 years ago

Hi,

Sorry for the delay, this issue fell through the cracks apparently. I'll take a look and come back to you.

/Hadrien

Shahab-Sa commented 3 years ago

Hi, Thanks!

HadrienG commented 3 years ago

Hi!

I took some time to look at your issue, and I'm not really satisfied with any implementation of allowing incomplete abundance files at this time.

If you want to use abundance files, I would remove the unwanted genomes from your multi-fasta at this time. I will be closing this issue for now, but I don't rule out implementing this in the future.

Best, Hadrien.