Closed SR-Martin closed 2 years ago
As far as I can see what you did is supposed to work. I could reproduce this behaviour and am investigating.
Hey, I think I found the source of this bug and fixed it. Can you test whether it works for you? Note that for better results with simulated strains you should provide a .gff file containing the coding sequences of the genomes to be simulated such that these aren't evolved.
Hi Adrian,
Thanks for looking at this so quickly. I'll get the new version and test it asap.
This seems to have fixed the problem, however I am encountering another error. It's not clear to me yet whether this is a problem with the installation (I am using it on an HPC, which makes things a bit more complex) or if it is a bug in the software. Here is the output anyway:
2020-12-17 12:50:29 WARNING: [Validator 10177412903] No gff file (gene annotation) was given. Simulating strains without such a file can break genes. 2020-12-17 12:50:29 INFO: [Validator 10177412903] Simulating strain evolution of 'Genome12.0' 2020-12-17 12:50:29 INFO: [Validator 10177412903] Simulating strain evolution of 'Genome14.0' Failure in sgEvolver at /usr/local/bin/scripts/StrainSimulationWrapper/sgEvolver/simujobrun.pl line 49. Failure in sgEvolver at /usr/local/bin/scripts/StrainSimulationWrapper/sgEvolver/simujobrun.pl line 49. Task failed with return code: 255, task: cd /tmp/tmp3nv212/Genome12.0.strains; /usr/local/bin/scripts/StrainSimulationWrapper/sgEvolver/simujobrun.pl defaults/genomes/GCA_000231385.3_ASM23138v3.fa /tmp/tmp3nv212/tmpizoVzO 5633002653896701005 >> /tmp/tmp3nv212/Genome12.0.strains/GCA_000231385.3_ASM23138v3.fa.sim.log Task failed with return code: 255, task: cd /tmp/tmp3nv212/Genome14.0.strains; /usr/local/bin/scripts/StrainSimulationWrapper/sgEvolver/simujobrun.pl defaults/genomes/GCA_000006785.2_ASM678v2.fa /tmp/tmp3nv212/tmpizoVzO 6180714611745142895 >> /tmp/tmp3nv212/Genome14.0.strains/GCA_000006785.2_ASM678v2.fa.sim.log 2020-12-17 12:50:30 ERROR: [Validator 10177412903] Task failed with return code: 255, task: cd /tmp/tmp3nv212/Genome12.0.strains; /usr/local/bin/scripts/StrainSimulationWrapper/sgEvolver/simujobrun.pl defaults/genomes/GCA_000231385.3_ASM23138v3.fa /tmp/tmp3nv212/tmpizoVzO 5633002653896701005 >> /tmp/tmp3nv212/Genome12.0.strains/GCA_000231385.3_ASM23138v3.fa.sim.log
2020-12-17 12:50:30 ERROR: [Validator 10177412903] Task failed with return code: 255, task: cd /tmp/tmp3nv212/Genome14.0.strains; /usr/local/bin/scripts/StrainSimulationWrapper/sgEvolver/simujobrun.pl defaults/genomes/GCA_000006785.2_ASM678v2.fa /tmp/tmp3nv212/tmpizoVzO 6180714611745142895 >> /tmp/tmp3nv212/Genome14.0.strains/GCA_000006785.2_ASM678v2.fa.sim.log
2020-12-17 12:50:30 ERROR: [Validator 10177412903] Simulation of strains failed.
I'll look into this and see if I can find the cause of the failure.
Ah yes, you will need to use absolute paths in the genome_to_id.tsv
file for this to work.
Great, this seems to be working now. Thanks for your help!
Hello, I do observe a similar error in simulating strain evolution. Failure in sgEvolver at /path/CAMISIM/scripts/StrainSimulationWrapper/sgEvolver/simujobrun.pl line 49. 2022-07-13 15:08:02 ERROR: [MetagenomeSimulationPipeline] [Errno 2] No such file or directory: '/tmp/tmpxayr4vkn/Genome17.0.strains/Taxon014.fasta' -> '/tmp/tmpxayr4vkn/Genome17.0.strains/simulated_Genome17.0.Taxon014.fna' in line 83 2022-07-13 15:08:02 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted
I followed the solutions suggested in this issue as well as issue #132 But didn't resolve the problem. Could you provide some hints to rectify this error? Thank you.
Hi, just looking at this error it seems most likely that the sgEvolver itself failed (line 49 is the call to sgEvolver). Could you post your complete log (and your config file) if you still have it available? Unfortunately it is hard to tell what is going wrong just from this message alone.
The config.ini file: `seed=632741178 phase=0 max_processors=8 dataset_id=RL output_directory=/home/users/yazhini.a01/software/CAMISIM temp_directory=/tmp gsa=True pooled_gsa=True anonymous=True compress=1
[ReadSimulator] readsim=/home/users/yazhini.a01/software/CAMISIM/tools/art_illumina-2.3.6/art_illumina error_profiles=/home/users/yazhini.a01/software/CAMISIM/tools/art_illumina-2.3.6/profiles samtools=/home/users/yazhini.a01/software/CAMISIM/tools/samtools-1.3/samtools profile=mbarc base_profile_name= profile_read_length= size=0.1 type=art fragments_size_mean=270 fragment_size_standard_deviation=27
[CommunityDesign] distribution_file_paths= ncbi_taxdump=/home/users/yazhini.a01/software/CAMISIM/tools/ncbi-taxonomy_20170222.tar.gz strain_simulation_template=/home/users/yazhini.a01/software/CAMISIM/scripts/StrainSimulationWrapper/sgEvolver/simulation_dir number_of_samples=20
[community0] metadata=/home/users/yazhini.a01/software/CAMISIM/defaults/metadata.tsv id_to_genome_file=/home/users/yazhini.a01/software/CAMISIM/defaults/genome_to_id.tsv id_to_gff_file=/home/users/yazhini.a01/software/CAMISIM/defaults/genome_to_gff.tsv genomes_total=5 num_real_genomes=3 max_strains_per_otu=2 ratio=1 mode=differential log_mu=1 log_sigma=2 gauss_mu=1 gauss_sigma=1 view=False`
The terminal output is given below: 2022-07-13 18:03:06 INFO: [MetagenomeSimulationPipeline] Metagenome simulation starting 2022-07-13 18:03:06 INFO: [MetagenomeSimulationPipeline] Validating Genomes 2022-07-13 18:03:06 INFO: [MetadataReader] Reading file: '/home/users/yazhini.a01/software/CAMISIM/defaults/genome_to_id.tsv' 2022-07-13 18:03:24 INFO: [MetagenomeSimulationPipeline] Design Communities 2022-07-13 18:03:24 INFO: [CommunityDesign] Drawing strains. 2022-07-13 18:03:24 INFO: [MetadataReader 1918002902] Reading file: '/home/users/yazhini.a01/software/CAMISIM/defaults/metadata.tsv' 2022-07-13 18:03:24 INFO: [MetadataReader 9013202836] Reading file: '/home/users/yazhini.a01/software/CAMISIM/defaults/genome_to_gff.tsv' 2022-07-13 18:03:24 INFO: [MetadataReader 46447426715] Reading file: '/home/users/yazhini.a01/software/CAMISIM/defaults/genome_to_id.tsv' 2022-07-13 18:03:24 INFO: [CommunityDesign] Validating raw sequence files! 2022-07-13 18:03:27 INFO: [Validator 31395689975] Simulating strain evolution of 'Genome17.0' Failure in sgEvolver at /home/mpg01/MBPC/yazhini.a01/software/CAMISIM/scripts/StrainSimulationWrapper/sgEvolver/simujobrun.pl line 49. 2022-07-13 18:03:27 ERROR: [MetagenomeSimulationPipeline] [Errno 2] No such file or directory: '/tmp/tmp7f3v_qmh/Genome17.0.strains/Taxon014.fasta' -> '/tmp/tmp7f3v_qmh/Genome17.0.strains/simulated_Genome17.0.Taxon014.fna' in line 83 2022-07-13 18:03:27 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted
Okay, thank you, it looks fine in principle. I am investigating this, could you mean while run CAMISIM with the -debug
flag and pipe the log to a file to see if anything more shows?
Thank you.
With the usage of -debug
flag, some files are written in tmp folder. Here are some details from sgEvolver.err
Unhandled gnException: Exception FileNotOpened thrown from Unknown() in gnFileSource.cpp 67 Called by Unknown() Exited with code 65280
and from GCA_000242255.3_ASM24225v3.fa.sim.log
Executing /home/mpg01/MBPC/yazhini.a01/software/CAMISIM/scripts/StrainSimulationWrapper/sgEvolver/sgEvolver --stop-codon-bias=0.98 --ancestral-gff=defaults/gffs/Genome17.0-GCF_000242255.2_genomic.gff --accessory-gff=defaults/gffs/Genome17.0-GCF_000242255.2_genomic.gff --indel-size=1 --indel-freq=0.05 --small-ht-freq=0.05 --small-ht-size=200 --large-ht-freq=0.005 --inversion-freq=0.005 --large-ht-min=10000 --large-ht-max=60000 --random-seed=358327309434766444 --inversion-size=50000 template.tree defaults/genomes/GCA_000242255.3_ASM24225v3.fa defaults/genomes/GCA_000242255.3_ASM24225v3.fa evolved.dat evolved_seqs.fas >sgEvolver.out 2>sgEvolver.err
Could you make sure that all the files referenced in this command are present and it runs in a vacuum? I.e. the genome file seems to one of the genomes part of CAMISIM by default while the gff file comes from you?
Yes, I have obtained .gff file from NCBI (for each of the default genomes given in the CAMISIM) as you had suggested to give it as added input for strain simulation. So how do I give the .gff file then? Also, sorry I don't understand the statement it runs in a vacuum
.
The line you send from the log describes the call to sgEvolver which CAMISIM internally creates. This command should be possible to run without the usage of CAMISIM, i.e. running
/home/mpg01/MBPC/yazhini.a01/software/CAMISIM/scripts/StrainSimulationWrapper/sgEvolver/sgEvolver --stop-codon-bias=0.98 --ancestral-gff=defaults/gffs/Genome17.0-GCF_000242255.2_genomic.gff --accessory-gff=defaults/gffs/Genome17.0-GCF_000242255.2_genomic.gff --indel-size=1 --indel-freq=0.05 --small-ht-freq=0.05 --small-ht-size=200 --large-ht-freq=0.005 --inversion-freq=0.005 --large-ht-min=10000 --large-ht-max=60000 --random-seed=358327309434766444 --inversion-size=50000 template.tree defaults/genomes/GCA_000242255.3_ASM24225v3.fa defaults/genomes/GCA_000242255.3_ASM24225v3.fa evolved.dat evolved_seqs.fas >sgEvolver.out 2>sgEvolver.err
in your console/bash from the CAMISIM directory should work if all the files are present (it is strange though that template.tree
which is in the scripts/StrainSimulationWrapper/sgEvolver/simulation_dir
folder does not have a prefix). This makes me think that you probably should use absolute paths for all your genomes in the genome_to_id.tsv
and genome_to_gff.tsv
files
If it does not run then the problem is in sgEvolver
or in one of the files provided to this command. If it does run, then the problem lies within CAMISIM.
Thank you for the indication. So eventually the absolute path
information has to be given in the genome_to_id.tsv
and genome_to_gff.tsv
. It works normally now. It turns out that the same solution as you mentioned before but I could understand it only now.
Thanks very much.
Yes, sorry that should be documented in a better way (since it works if no strains are simulated). I will add something to the documentation and hope that these kind of errors disappear in our 2.0 version which is coming soon™
I'd like to use CAMISIM to simulate a metagenome with a lot of strain level variation, but I am having some trouble. The documentation states "Artificial strains evolved from real genomes are added to the community genome collection until the difference between genomes total and num real genomes has been reached." This suggests e.g. setting num_real_genomes=5 and genomes_total=10 (and max_strains_per_otu > 1) to include 5 strains simulated from the real genomes.
In this case, if my metadata contains 10 real genomes, then these all appear in the resulting metagenome, and there are no simulated strains. If there are fewer than 10 genomes in the metadata then I get the following error:
ERROR: [DefaultLogging] Not enough data to draw. ERROR: [MetagenomeSimulationPipeline] Not enough data to draw. in line 83
Are there some extra parameters that need to be set? Or maybe I have misunderstood how CAMISIM works? Please help!