AlexandrovLab / SigProfilerSimulatorR

An R wrapper for running the SigProfilerSimulator framework
BSD 2-Clause "Simplified" License
5 stars 0 forks source link

Lower mutation number in simulatation #5

Closed cutleraging closed 10 months ago

cutleraging commented 12 months ago

Hello,

Thanks for a great tool! I have been using it to randomize the position of mutations in samples while controlling for mutational signatures. However, I have noticed that I am not getting the same amount of mutations in the simulated files as in the input, which I would expect.

I am running the tool like this, including some specific regions to simulate in (based on coverage): `#!/usr/bin/env python

import sys from SigProfilerSimulator import SigProfilerSimulator as sigSim

name = sys.argv[1] vcf_dir = sys.argv[2] bed = sys.argv[3].strip()

sigSim.SigProfilerSimulator(name, \ vcf_dir, \ "GRCh37", \ contexts=["96", "ID"], \ exome=None, \ simulations=10, \ updating=False, \ bed_file=bed, \ overlap=False, \ gender='female', \ chrom_based=False, \ seed_file=None, \ noisePoisson=False, \ cushion=100, \ region=None, \ vcf=True)`

Here is the output of the log file: `====================================== SigProfilerSimulator

Checking for all reference files and relevant matrices... Creating a chromosome proportion file for the given BED file ranges...Completed! SigProfilerSimulator_cell_output/C1_05_24/output/ID/C1_05_24.ID83.region does not exist. Creating the matrix file now. Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 116.12 seconds. Starting matrix generation for INDELs...Completed! Elapsed time: 106.96 seconds. Matrices generated for 1 samples with 0 errors. Total of 597 SNVs, 1 DINUCs, and 15 INDELs were successfully analyzed. The context distribution file does not exist. This file needs to be created before simulating. This may take several hours... Chromosome X done Chromosome 1 done Chromosome 2 done Chromosome 3 done Chromosome 4 done Chromosome 5 done Chromosome 6 done Chromosome 7 done Chromosome 8 done Chromosome 9 done Chromosome 10 done Chromosome 11 done Chromosome 12 done Chromosome 13 done Chromosome 14 done Chromosome 15 done Chromosome 16 done Chromosome 17 done Chromosome 18 done Chromosome 19 done Chromosome 20 done Chromosome 21 done The context distribution file has been created! The context distribution file does not exist. This file needs to be created before simulating. This may take several hours... Chromosome X done Chromosome 1 done Chromosome 2 done Chromosome 3 done Chromosome 4 done Chromosome 5 done Chromosome 6 done Chromosome 7 done Chromosome 8 done Chromosome 9 done Chromosome 10 done Chromosome 11 done Chromosome 12 done Chromosome 13 done Chromosome 14 done Chromosome 15 done Chromosome 16 done Chromosome 17 done Chromosome 18 done Chromosome 19 done Chromosome 20 done Chromosome 21 done The context distribution file has been created!

Files successfully read and mutations collected. Mutation assignment starting now. Mutations have been distributed. Starting simulation now... Chromosome 21 done Chromosome 22 done Chromosome 19 done Chromosome 20 done Chromosome 18 done Chromosome 13 done Chromosome 17 done Chromosome 15 done Chromosome 16 done Chromosome 14 done Chromosome 9 done Chromosome 11 done Chromosome 10 done Chromosome 12 done Chromosome X done Chromosome 8 done Chromosome 7 done Chromosome 6 done Chromosome 4 done Chromosome 5 done Chromosome 3 done Chromosome 1 done Chromosome 2 done Simulation completed Job took 2381.3919444084167 seconds`

In my input file I have 613 mutations, but in the simulated output I only get 313 mutations. This issue is not specific to this file. Any help with this? I have tried to circumvent this by creating many more simulations than I need and sampling them to get the matching number of mutations as in my real sample.

Thanks, Ronnie

mdbarnesUCSD commented 10 months ago

Hi,

Thanks for reaching out about this. We are looking into this currently, and progress can be tracked here.

Thanks!