AlexandrovLab / SigProfilerAssignment

Assignment of known mutational signatures to individual samples and individual somatic mutations
BSD 2-Clause "Simplified" License
46 stars 10 forks source link

Exome renormalization seems to miss mutations #119

Closed pushpa-itagi closed 7 months ago

pushpa-itagi commented 7 months ago

Hi,

We are using the sigprofiler assignment tool (COSMIC V3.2) to fit the signatures for WES samples. When we set the exome=True parameter, it seems like in the final file which is the Activities.txt some mutations are missed from the input vcf file. For instance, if the input vcf file had 500 SNV's and then if we check in the final Activities.txt file <500 mutations are seen. Not sure what is causing it to miss these mutations, is it possible that the exome-=True removes certain mutations? or the renormalization cannot assign some set of mutations? Please let me know if it there is a param or something that needs to be changed.

Thanks Pushpa Itagi

mdbarnesUCSD commented 7 months ago

Hi @pushpa-itagi,

Could you please confirm if you are providing VCF files as input? If VCFs are provided and exome=True, then the mutations are downsampled to the exome regions of the genome (review parameter exome on SigProfilerMatrixGenerator's README).

pushpa-itagi commented 7 months ago

Hi, Yes we used vcf files as input. Ah, I see the matrix generator module explains it. Thanks for the quick reply. Also,

  1. Is there a file or database you refer to to define which mutations are exonic? Because if I check my sample annotations a very small fraction are exonic so not sure which mutations are being tagged as exonic.
  2. I also see a BED file as an option to be given, can we use that if we want to retain all SNVs (not just exonic) but still use exome=TRUE. Will that work? Please let me know. Thanks!
mdbarnesUCSD commented 7 months ago
  1. The exome regions are defined in the exome directory for the corresponding reference genome. https://github.com/AlexandrovLab/SigProfilerMatrixGenerator/tree/master/SigProfilerMatrixGenerator/references/chromosomes/exome These regions are from SureSelect v7.

  2. If you want to retain all SNVs, run the matrix generator with BED=None. and exome=False. This will disable downsampling to the regions specified in SureSelect.

mdbarnesUCSD commented 7 months ago

Please reach out if you have any additional questions.