AlexandrovLab / SigProfilerExtractor

SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
BSD 2-Clause "Simplified" License
153 stars 51 forks source link

Reproducibility of results #72

Closed Marozi2 closed 3 years ago

Marozi2 commented 3 years ago

Hi,

I'm trying to reproduce some results with SigProfiler but I didn't succeed. I've made, for each cancer type, mutational catalogs from this file WGS_PCAWG.96.csv downloaded from https://dcc.icgc.org/releases/PCAWG/mutational_signatures/Input_Data_PCAWG7_23K_Spectra_DB/Mutation_Catalogs_--_Spectra_of_Individual_Tumours/WGS_PCAWG_2018_02_09.zip

I've run SigProfilerExtractor on each cancer by calling at least 1 signature and at most 7 signatures with a refit on COSMIC signatures version 3.2. I was expecting to get really close results to SigProfilier_PCAWG_WGS_probabilities_SBS.csv downloaded from https://dcc.icgc.org/releases/PCAWG/mutational_signatures/Attributions_to_Each_Mutational_Class/SP_Attributions_to_Each_Mutational_Class/SigProfilier_PCAWG_WGS_probabilities_SBS.csv. Command used: sig.sigProfilerExtractor("matrix", outputfile, inputcatalog, seeds="random", reference_genome="GRCh37", opportunity_genome="GRCh37", matrix_normalization="gmm", cosmic_version=3.2, resample = True, context_type="SBS96", exome=False, minimum_signatures=1, maximum_signatures=7, nmf_test_conv=1000, nmf_replicates=10, clustering_distance="cosine", min_nmf_iterations=3000, refit_denovo_signatures=True, nmf_init="random", nnls_add_penalty=0.05, nnls_remove_penalty=0.01, initial_remove_penalty=0.05, make_decomposition_plots=True, get_all_signature_matrices=False, cpu=cpu)

I compared the output file Decomposed_Mutation_Probabilities.txt from my run with SigProfilier_PCAWG_WGS_probabilities_SBS.csv but unfortunately I have different results (different percentages in different signatures). Also one of the issues is that I often have 3 or 4 out of 7 signatures unrefitted to COSMIC signatures and these "new" signatures, most of the time, account for the major part of mutation probabilities. I also tried to run SigProfilerSingleSample on these same data to avoid the problem of unrefitted signatures and still hoping to get close results to SigProfilier_PCAWG_WGS_probabilities_SBS.csv. Again, the results are really different from SigProfilier_PCAWG_WGS_probabilities_SBS.csv.

Could you explain, please, where does this non-reproducibility come from? Does SigProfilier_PCAWG_WGS_probabilities_SBS.csv correspond to SigProfilerExtractor output of WGS_PCAWG.96.csv? Do you proceed to other steps between SigProfilerExtractor output and SigProfilier_PCAWG_WGS_probabilities_SBS.csv?

Thank you.

mishugeb commented 3 years ago

Hi, Thanks for your question. I am not quite aware of the data/results uploaded to the ICGC portal. However, I have the recent results analyzed from the PCAWG dataset with our current tool. I have uploaded the results extracted from Billiary-AdenoCA. Please see if the results match the results you extracted.
Biliary-AdenoCA.zip

Thanks, Mishu

Marozi2 commented 3 years ago

Hi,

Thank you very much. Could you please send me the input file you used and the exact SigProfilerExtractor command you used with all options to try to reproduce your result because I don't have the same. Here is my result: Biliary.AdenoCA.zip

Thank you

mishugeb commented 3 years ago

JOB_METADATA.txt I have attached the parameters I used. You will find that in the JOB_METADATA.txt file. Alternatively, you can send your JOB_METADATA.txt file.

Thanks, Mishu

ghost commented 2 years ago

I know this issue is closed, but just throwing this out - i think the number of signatures you are running is likely to be part of the problem.

i have not done this kind of analysis exactly, but a lot of related ones. will definitely change the loadings a fair amount.