HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
176 stars 32 forks source link

Need help in simulation of whole metagenome data #252

Closed sahilrishav2 closed 4 months ago

sahilrishav2 commented 4 months ago

Hi,

I was trying to simulate the paired-end metagenome reads through some fasta files of bacterial genomes. My query is that is it necessary to provide multiple draft fasta files individually or can I merge all the draft fasta files into single multi fasta files just like the case of complete genomes where we need to include only one multi fasta file . Second, I want to know why the tool is not giving the desired output such as genome id (RefSeq) and its abundance value, when I am trying to run the tool, it is giving the output like this:

head miseq_reads_abundance.txt NZ_MTYW01000001.1 0.00013815698345148888 NZ_MTYW01000010.1 0.004262029865089716 NZ_MTYW01000100.1 0.0008520376008117435 NZ_MTYW01000101.1 0.0007903018578518176 NZ_MTYW01000102.1 0.0011015537049246815

The problem is that the complete genome itself has multiple fasta sequences: chromosome sequences and plasmid sequences, so, is there an option where instead of showing the abundance values of each fasta sequence, the tool shows the abundance values of each bacterial genome in the metagenome reads like this:

Bacteria A 0.7 Bacteria B 0.3

Thank you in advance

HadrienG commented 4 months ago

if you have input files with several contigs you can use the --draft option instead of --genomes

sahilrishav2 commented 4 months ago

Thank you but I have one more query. Is it necessary to provide multiple draft fasta files individually? For example, if I have 100 draft sequences, so, in the --draft option, should I provide input files like this iss generate --draft draft1.fasta draft2.fasta draft3.fasta draft4.fasta draft5.fasta and so on ? But if I do like this, it would be difficult to write the names of 100 draft sequences individually. Also, I tried to merge them and make one multi fasta file named draft, but then after running the tool, it gave output abundance file like this:

draft 0.6

but I want an abundance file like this:

draft1 0.6 draft2 0.5 draft3 0.4 draft4 0.7 draft5 0.1

Kindly guide me regarding this concern.

sahilrishav2 commented 4 months ago

Thank you for your time, I solved it. I had to write a bash script for it.

HadrienG commented 4 months ago

Glad you solved it! Otherwise more often than not you can use a wildcard:

iss generate --draft *.fasta ...