AlexandrovLab / SigProfilerExtractor

SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
BSD 2-Clause "Simplified" License
153 stars 51 forks source link

De novo extracted signatures reconstruction seems off #193

Closed skeitaa closed 1 year ago

skeitaa commented 1 year ago

Dear all,

I'm trying to use SigProfilerExtractor on PCAWG data for an ongoing project. I managed to extract de novo signatures but something seems wrong with the reconstruction step. I attached the decomposition log file and the script to run SPE below. I also tried the decomposition with the new SigProfilerAssignment tool but the results were the same. Could you please let me know if there is any way to fix this? I know the samples are very heterogeneous and the dataset size is quite large so I thought I might have to try on reduced datasets but maybe the issue lies elsewhere. Thanks a lot for your help! Cosmic_SBS96_Decomposition_Log.txt run_SPE.py.txt

marcos-diazg commented 1 year ago

Dear @skeitaa,

Thanks for your interest! As far as I understand, the tool is running successfully on your end, but you are not obtaining your expected results after the decomposition of de novo signatures to COSMIC reference signatures. With the information provided, it's quite difficult to debug your issue, but you should make sure that you are generating the input mutational matrix correctly, using the specific reference genome build for your input data (GRCh37 generally for PCAWG), etc.

Also, we have recently reanalyzed all PCAWG data with SigProfilerExtractor in our most recent publication. You can find the de novo extracted signatures for each cancer type here: https://figshare.com/articles/dataset/Supplementary_data_for_Islam_et_al_2022_-_PCAWG_reanalysis/20406279.

I'll proceed to close this ticket as it does not seem related to an issue with the code, but I'm happy to follow up over email (mdiazgay@health.ucsd.edu) if you have any additional doubts about the analysis.

Marcos

skeitaa commented 1 year ago

I was using GRCh38 indeed. Thanks for pointing me to the de novo signatures! Stephane