Genomic Coverage - Githubissues

AlexandrovLab / SigProfilerTopography

SigProfilerTopography allows evaluating the effect of chromatin organization, histone modifications, transcription factor binding, DNA replication, and DNA transcription on the activities of different mutational processes. SigProfilerTopography elucidates the unique topographical characteristics of mutational signatures.

BSD 2-Clause "Simplified" License

18 stars 1 forks source link

Genomic Coverage #4

Closed cutleraging closed 10 months ago

cutleraging commented 10 months ago

Hello,

Thanks for a great tool. I was wondering if there was an option to only focus the analysis on a subset of genomic regions, as one can do using SigProfilerSimulator? This is to account for coverage differences in samples.

Best, Ronnie

burcakotlu commented 10 months ago

Dear Ronnie,

Thank you very much for using SigProfilerTopography (SPT). SigProfilerSimulator can randomly generate mutations from a given subset of genomic regions.

SPT intersects given genomic regions (e.g., mutations) and their close vicinity with the genomic features in the library files. What is your expectation from focusing only on a subset of genomic regions? Since you can change the input genomic regions. Please let me know.

Thanks, Burcak

cutleraging commented 10 months ago

Hi Burcak,

Thanks for your quick response. My concern is that if the sequencing coverage is different between samples, then during the simulation, mutations will occur where we have no information as to if they occurred in that particular region or not.

This would then seem to be a problem when calculating the expected vs observed ratios. Because if a particular region was not covered in the observed sample, but in the simulation it was, then one would get inaccurate ratios.

Does this make sense?

Best, Ronnie

burcakotlu commented 10 months ago

Dear Ronnie,

If I understood you correctly, you want to provide the parameters to tune simulations(SPS) while running SPT. Is that correct?

Best, Burcak

cutleraging commented 10 months ago

Yes, I would like to provide a bed file which has information on what regions were covered in a sample. Specifically the bed_file parameter in the SigProfilerSimulator function.

burcakotlu commented 10 months ago

In fact, it is doable. I can update the SPT and let you know in a few days. You can test it and let me know. Then we can close this issue.

Best, Burcak

cutleraging commented 10 months ago

Thanks, does it make sense to do though? Would you expect the results to be different for samples with variable amounts of genomic coverage?

Best, Ronnie

burcakotlu commented 10 months ago

In your case, it may not be necessary. If sequencing coverage/depth is low, then your confidence regarding the conclusions/mutations drawn from them will be low. However, it doesn't require restricting the genomic regions for simulations. But for other cases, it might be needed.

cutleraging commented 10 months ago

I think there is a misunderstanding. The regions which mutations are called from must meet certain coverage/quality criteria, so where there is low coverage mutations are not called. When I speak about coverage what I mean is regions which met the coverage/quality criteria (where mutations are called) vs regions which did not meet the coverage/quality criteria. And this can vary from sample to sample. For example, in single cell whole genome sequencing data, coverage is usually only ~50%, so this would seem to matter more here.

Then, let's say we are trying to determine the observed/expected ratio for a region which is only 50% covered in the observed sample. My worry is that the number you get for the expected, from the simulations, will not account for the fact that only 50% of the region we are interested in is covered in the sample.

Does this make sense?

burcakotlu commented 10 months ago

Hi Ronnie,

SPT version 1.0.80 supports bed_file for simulations. When you restrict the simulations to the genomic regions with certain coverage/quality criteria, it would be similar to shuffling the input, I guess. I will close the issue. If you have any problem, please feel free to reopen the issue.