HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
176 stars 32 forks source link

Experimental groups #227

Closed magibc closed 10 months ago

magibc commented 1 year ago

Hello,

First of all thanks for your tool. I would like to use your tool for benchmarking purposes in metagenome experiments. However,I would like to create three datasets: one with low complexity, other with median and last with high complexity. Nevertheless, I did not find how to define Insilicoseq different experimental groups. How can I achieve?

Thanks on advance,

Magi.

magibc commented 1 year ago

Dear @HadrienG ,

After thinking about it, and sorry if it is a stupid question because I am a self-learning in metagenome for curiosity, I understand that to create experimental groups, I should have to repeat the following code (toy example)

iss generate --cpus 8 --genomes SRS121011.fasta --model hiseq --output hiseq_reads_sample2 --seed 123 --abundance_file control.txt iss generate --cpus 8 --genomes SRS121011.fasta --model hiseq --output hiseq_reads_sample3 --seed 124 --abundance_file control.txt

to the necessary times as number of samples desired for each experimental group.

And then for a second experimental group I should change a bit the abundance file in order to check differences in downstream analysis of beta diversity/differential abundance analysis...

This could be a good approach?

To create a high complexity simulation dataset, you recommends me using abundance or coverage file? Because InSilicoSeq the abundances values NOT represent the relative proportion of each reference genome in the resulting simulated data. Then I understant that will be better to use coverage file.

Thanks another time,

Magi.

HadrienG commented 10 months ago

Hi!

Sorry for the slow response. You are right, due to InSilicoSeq not supporting experimental groups out of the box, you'd have to run the program several time with different abundances/coverage values. Note that changing the seed will changes where the errors are introduced in the reads, but does not influence the abundance values if an abundance file is supplied.

As for coverage/abundance, this is a matter of personal preference, you can refer to https://github.com/HadrienG/InSilicoSeq/issues/127 for more information

Best, Hadrien