Closed cutleraging closed 10 months ago
Dear Ronnie,
Thank you very much for using SigProfilerTopography (SPT). SigProfilerSimulator can randomly generate mutations from a given subset of genomic regions.
SPT intersects given genomic regions (e.g., mutations) and their close vicinity with the genomic features in the library files. What is your expectation from focusing only on a subset of genomic regions? Since you can change the input genomic regions. Please let me know.
Thanks, Burcak
Hi Burcak,
Thanks for your quick response. My concern is that if the sequencing coverage is different between samples, then during the simulation, mutations will occur where we have no information as to if they occurred in that particular region or not.
This would then seem to be a problem when calculating the expected vs observed ratios. Because if a particular region was not covered in the observed sample, but in the simulation it was, then one would get inaccurate ratios.
Does this make sense?
Best, Ronnie
Dear Ronnie,
If I understood you correctly, you want to provide the parameters to tune simulations(SPS) while running SPT. Is that correct?
Best, Burcak
Yes, I would like to provide a bed file which has information on what regions were covered in a sample. Specifically the bed_file
parameter in the SigProfilerSimulator
function.
In fact, it is doable. I can update the SPT and let you know in a few days. You can test it and let me know. Then we can close this issue.
Best, Burcak
Thanks, does it make sense to do though? Would you expect the results to be different for samples with variable amounts of genomic coverage?
Best, Ronnie
In your case, it may not be necessary. If sequencing coverage/depth is low, then your confidence regarding the conclusions/mutations drawn from them will be low. However, it doesn't require restricting the genomic regions for simulations. But for other cases, it might be needed.
I think there is a misunderstanding. The regions which mutations are called from must meet certain coverage/quality criteria, so where there is low coverage mutations are not called. When I speak about coverage what I mean is regions which met the coverage/quality criteria (where mutations are called) vs regions which did not meet the coverage/quality criteria. And this can vary from sample to sample. For example, in single cell whole genome sequencing data, coverage is usually only ~50%, so this would seem to matter more here.
Then, let's say we are trying to determine the observed/expected ratio for a region which is only 50% covered in the observed sample. My worry is that the number you get for the expected, from the simulations, will not account for the fact that only 50% of the region we are interested in is covered in the sample.
Does this make sense?
Hi Ronnie,
SPT version 1.0.80 supports bed_file
for simulations.
When you restrict the simulations to the genomic regions with certain coverage/quality criteria, it would be similar to shuffling the input, I guess.
I will close the issue. If you have any problem, please feel free to reopen the issue.
Hello,
Thanks for a great tool. I was wondering if there was an option to only focus the analysis on a subset of genomic regions, as one can do using SigProfilerSimulator? This is to account for coverage differences in samples.
Best, Ronnie