Closed xuechunxu closed 3 years ago
Hey, unfortunately I am not entirely sure what you are planning to do. In my understanding, shotgun metagenomics entails complete genomes being simulated, but it sounds like you only want to simulate functional genes?
Hey, I want to simulate metagenome datasets, which I know the relative abundance of functional genes. For example, the relative abundance of gene A is known in CAMISIM output. Using functional gene instead of complete genome. But I think it can not work.
Another doubt, what's the meaning of "seed" in defaults/mini_config.ini file. Can I set the size of simulated reads?
If you only want to simulate reads from these functional genes you would need to use these as your "genomes".
A seed is used to ensure reproducability: Since the read simulators work with randomness, setting the same seed for the random number generators ensures that the output is the same for two runs.
The size of the reads is controlled with the fragments_size_mean
and fragment_size_standard_deviation
parameters in the config file. The size
parameter describes the size per sample (in Gigabases).
Sorry, I didn't make it clear about the size of the reads. I mean the size of the file of simulated reads, that is the file anonymous_reads.fq.gz
.
Then the size
parameter will be the controlling factor. The size of the read file will be roughly number_of_samples
* size
(in GB).
Since the file itself is compressed afterwards, the actual size might be a little less
I used the same data and parameters to run CAMISIM
, but the result is different every time.
Cyanobacteria.zip
This is the file I used, and run python metagenomesimulation.py defaults/mini_config.ini
I set number_of_samples=5
, and the same relative abundance of genomes for each samples. I guess these five simulated sample reads are the same. But they are different.
If you want to manually set your abundance distributions, you need to add a parameter to the config: distribution_file_paths
in the CommunityDesign
section´. This parameter points to your abundance files, which have to be tab-separated files with genome_ID
and abundance
. See also here.
If you have done this and set the seed
parameter, then two subsequent runs of CAMISIM will be the same. The reads of the individual samples will still differ though, because the way we designed CAMISIM we don't want the exact same reads from the same genomes in two different samples. If you want to do this, you would have to start two different CAMISIM runs with the same seed.
hello,
Can I model abundance distributions of functional genes and to simulate corresponding shotgun metagenome datasets?
Very thanks!
chunxu