Thanks for a great tool! I have been using it to randomize the position of mutations in samples while controlling for mutational signatures. However, I have noticed that I am not getting the same amount of mutations in the simulated files as in the input, which I would expect.
I am running the tool like this, including some specific regions to simulate in (based on coverage):
`#!/usr/bin/env python
import sys
from SigProfilerSimulator import SigProfilerSimulator as sigSim
name = sys.argv[1]
vcf_dir = sys.argv[2]
bed = sys.argv[3].strip()
Here is the output of the log file:
`======================================
SigProfilerSimulator
Checking for all reference files and relevant matrices...
Creating a chromosome proportion file for the given BED file ranges...Completed!
SigProfilerSimulator_cell_output/C1_05_24/output/ID/C1_05_24.ID83.region does not exist. Creating the matrix file now.
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 116.12 seconds.
Starting matrix generation for INDELs...Completed! Elapsed time: 106.96 seconds.
Matrices generated for 1 samples with 0 errors. Total of 597 SNVs, 1 DINUCs, and 15 INDELs were successfully analyzed.
The context distribution file does not exist. This file needs to be created before simulating. This may take several hours...
Chromosome X done
Chromosome 1 done
Chromosome 2 done
Chromosome 3 done
Chromosome 4 done
Chromosome 5 done
Chromosome 6 done
Chromosome 7 done
Chromosome 8 done
Chromosome 9 done
Chromosome 10 done
Chromosome 11 done
Chromosome 12 done
Chromosome 13 done
Chromosome 14 done
Chromosome 15 done
Chromosome 16 done
Chromosome 17 done
Chromosome 18 done
Chromosome 19 done
Chromosome 20 done
Chromosome 21 done
The context distribution file has been created!
The context distribution file does not exist. This file needs to be created before simulating. This may take several hours...
Chromosome X done
Chromosome 1 done
Chromosome 2 done
Chromosome 3 done
Chromosome 4 done
Chromosome 5 done
Chromosome 6 done
Chromosome 7 done
Chromosome 8 done
Chromosome 9 done
Chromosome 10 done
Chromosome 11 done
Chromosome 12 done
Chromosome 13 done
Chromosome 14 done
Chromosome 15 done
Chromosome 16 done
Chromosome 17 done
Chromosome 18 done
Chromosome 19 done
Chromosome 20 done
Chromosome 21 done
The context distribution file has been created!
In my input file I have 613 mutations, but in the simulated output I only get 313 mutations. This issue is not specific to this file. Any help with this? I have tried to circumvent this by creating many more simulations than I need and sampling them to get the matching number of mutations as in my real sample.
Hello,
Thanks for a great tool! I have been using it to randomize the position of mutations in samples while controlling for mutational signatures. However, I have noticed that I am not getting the same amount of mutations in the simulated files as in the input, which I would expect.
I am running the tool like this, including some specific regions to simulate in (based on coverage): `#!/usr/bin/env python
import sys from SigProfilerSimulator import SigProfilerSimulator as sigSim
name = sys.argv[1] vcf_dir = sys.argv[2] bed = sys.argv[3].strip()
sigSim.SigProfilerSimulator(name, \ vcf_dir, \ "GRCh37", \ contexts=["96", "ID"], \ exome=None, \ simulations=10, \ updating=False, \ bed_file=bed, \ overlap=False, \ gender='female', \ chrom_based=False, \ seed_file=None, \ noisePoisson=False, \ cushion=100, \ region=None, \ vcf=True)`
Here is the output of the log file: `====================================== SigProfilerSimulator
Checking for all reference files and relevant matrices... Creating a chromosome proportion file for the given BED file ranges...Completed! SigProfilerSimulator_cell_output/C1_05_24/output/ID/C1_05_24.ID83.region does not exist. Creating the matrix file now. Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 116.12 seconds. Starting matrix generation for INDELs...Completed! Elapsed time: 106.96 seconds. Matrices generated for 1 samples with 0 errors. Total of 597 SNVs, 1 DINUCs, and 15 INDELs were successfully analyzed. The context distribution file does not exist. This file needs to be created before simulating. This may take several hours... Chromosome X done Chromosome 1 done Chromosome 2 done Chromosome 3 done Chromosome 4 done Chromosome 5 done Chromosome 6 done Chromosome 7 done Chromosome 8 done Chromosome 9 done Chromosome 10 done Chromosome 11 done Chromosome 12 done Chromosome 13 done Chromosome 14 done Chromosome 15 done Chromosome 16 done Chromosome 17 done Chromosome 18 done Chromosome 19 done Chromosome 20 done Chromosome 21 done The context distribution file has been created! The context distribution file does not exist. This file needs to be created before simulating. This may take several hours... Chromosome X done Chromosome 1 done Chromosome 2 done Chromosome 3 done Chromosome 4 done Chromosome 5 done Chromosome 6 done Chromosome 7 done Chromosome 8 done Chromosome 9 done Chromosome 10 done Chromosome 11 done Chromosome 12 done Chromosome 13 done Chromosome 14 done Chromosome 15 done Chromosome 16 done Chromosome 17 done Chromosome 18 done Chromosome 19 done Chromosome 20 done Chromosome 21 done The context distribution file has been created!
Files successfully read and mutations collected. Mutation assignment starting now. Mutations have been distributed. Starting simulation now... Chromosome 21 done Chromosome 22 done Chromosome 19 done Chromosome 20 done Chromosome 18 done Chromosome 13 done Chromosome 17 done Chromosome 15 done Chromosome 16 done Chromosome 14 done Chromosome 9 done Chromosome 11 done Chromosome 10 done Chromosome 12 done Chromosome X done Chromosome 8 done Chromosome 7 done Chromosome 6 done Chromosome 4 done Chromosome 5 done Chromosome 3 done Chromosome 1 done Chromosome 2 done Simulation completed Job took 2381.3919444084167 seconds`
In my input file I have 613 mutations, but in the simulated output I only get 313 mutations. This issue is not specific to this file. Any help with this? I have tried to circumvent this by creating many more simulations than I need and sampling them to get the matching number of mutations as in my real sample.
Thanks, Ronnie