AlexandrovLab / SigProfilerExtractor

SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
BSD 2-Clause "Simplified" License
153 stars 51 forks source link

Creating custom signature databases #91

Closed bryce-turner closed 3 years ago

bryce-turner commented 3 years ago

Apologies if this is defined elsewhere already, but how would one go about creating a custom signature database? Or more specifically can we apply the known values from COSMIC to a different species?

For example SigProfilerMatrixGenerator supports dog (CanFam3.1), however downstream tools like SigProfilerExtractor do not fully support this. It looks like the limitation is when we are decomposing the de novo signatures, which is reliant on data files in the install directory. This makes sense based on what is available on COSMIC.

Noting a relevant code block that could be making some dangerous defaults for contexts other than SBS96: https://github.com/AlexandrovLab/SigProfilerExtractor/blob/b33d9d7101302eafb7d44379694979fa3e2fa7c4/SigProfilerExtractor/subroutines.py#L1095-L1140


Since the mutational signature profiles should be more or less the same across species, is there a script or tool for lifting these profiles across different genomes?

mdbarnesUCSD commented 3 years ago

Hi @bryce-turner thanks for reaching out. I am looping @lalexandrov1018 in for responding to this one.

lalexandrov1018 commented 3 years ago

Currently, all mutational signatures on COSMIC (and in SigProfiler tools) have been derived from human cancers or human normal somatic tissues. These human signatures have also been lifted to several other species (mice, rats, etc.) and they can be utilized using SigProfilerExtractor. However, it is important to note that these are NOT mouse mutational signatures but rather human signatures lifted to the mouse genome. For example, signatures SBS2 and SBS13 are commonly attributed to APOBEC3A/B which do not exist in mice. As such, it is highly unlikely to observe signatures SBS2 and SBS13 in mouse genomes. The reference set of mutational signatures can be downloaded from https://cancer.sanger.ac.uk/signatures/downloads/

We do not have a version for dog but you can renormalize the human signatures to the dog genome's trinucleotide frequency. Again, note that these will not be dog signatures but renormalized human signatures. SigProfilerExtractor has a decomposition function which allows passing a reference matrix for decomposition.

Lastly, your point about the code above being dangerous is well taken. We will work on fixing this in the next release of the tool.

bryce-turner commented 3 years ago

Thank you @lalexandrov1018 for the explanation!