CAMI-challenge / CAMISIM

CAMISIM: Simulating metagenomes and microbial communities
https://data.cami-challenge.org/participate
Apache License 2.0
167 stars 37 forks source link

Genome distributions change during the simulation #102

Closed MaxenceQueyrel closed 3 years ago

MaxenceQueyrel commented 3 years ago

I am trying to simulate metagenomes with a certain number of genomes from specific distributions. However, the number of reads per genome is not the same at all as the distribution file given to CAMISIM. I noticed that in the distribution folder (after simulation), the distribution files have changed from the initialization and the new ones seem to be consistent with the number of simulated reads. Maybe I missed something in the configuration but I don't know why this happens. Can you please give me more details about how to manage this ?

AlphaSquad commented 3 years ago

Hey, that is indeed strange. Can you confirm that you added the your distribution files to the distribution_file_paths option? If you could attach the config file you are using I can investigate what is going wrong.

MaxenceQueyrel commented 3 years ago

Hello, yes I added the distribution files to the distribution_file_paths option.

This is my config file :

[Main]
seed=324466
phase=0
max_processors=2
dataset_id=simulation
output_directory=./output_dir
temp_directory=./tmp_dir
gsa=True
pooled_gsa=True
anonymous=True
compress=1

[ReadSimulator]
readsim=./CAMISIM/tools/art_illumina-2.3.6/art_illumina
error_profiles=./CAMISIM/tools/art_illumina-2.3.6/profiles
samtools=./CAMISIM/tools/samtools-1.3/samtools
profil=mbarc
size=0.1
type=art
fragments_size_mean=270
fragment_size_standard_deviation=27

[CommunityDesign]
distribution_file_path=./HV.13_3.tsv,./LV.9_3.tsv
ncbi_taxdump=./CAMISIM/tools/ncbi-taxonomy_20170222.tar.gz
strain_simulation_template=./CAMISIM/scripts/StrainSimulationWrapper/sgEvolver/simulation_dir
number_of_samples=2

[community0]
metadata=./metadata.tsv
id_to_genome_file=./genome_to_id.tsv
id_to_gff_file=
genomes_total=272
genomes_real=272
max_strains_per_otu=1
ratio=1
mode=differential
log_mu=1
log_sigma=2
gauss_mu=1
gauss_sigma=1
view=False

If it is correct for you, maybe the error comes from another file ?

AlphaSquad commented 3 years ago

There might be a typo in the name of your option, could you try replacing the name of the option distribution_file_path with distribution_file_paths (i.e. distribution_file_paths=./HV.13_3.tsv,./LV.9_3.tsv) and see if this helps?

MaxenceQueyrel commented 3 years ago

Oh yes you are right I forgot the "s" at distribution_file_paths. It is working now. I thought CAMISIM would send back errors when a wrong option was entered. Thank you for the help.

/Comment by AlphaSquad: CAMISIM should report options which are not parsed, I will look into that. Thanks!