Closed bryce-turner closed 3 years ago
Hi @bryce-turner thanks for reaching out. I am looping @lalexandrov1018 in for responding to this one.
Currently, all mutational signatures on COSMIC (and in SigProfiler tools) have been derived from human cancers or human normal somatic tissues. These human signatures have also been lifted to several other species (mice, rats, etc.) and they can be utilized using SigProfilerExtractor. However, it is important to note that these are NOT mouse mutational signatures but rather human signatures lifted to the mouse genome. For example, signatures SBS2 and SBS13 are commonly attributed to APOBEC3A/B which do not exist in mice. As such, it is highly unlikely to observe signatures SBS2 and SBS13 in mouse genomes. The reference set of mutational signatures can be downloaded from https://cancer.sanger.ac.uk/signatures/downloads/
We do not have a version for dog but you can renormalize the human signatures to the dog genome's trinucleotide frequency. Again, note that these will not be dog signatures but renormalized human signatures. SigProfilerExtractor has a decomposition function which allows passing a reference matrix for decomposition.
Lastly, your point about the code above being dangerous is well taken. We will work on fixing this in the next release of the tool.
Thank you @lalexandrov1018 for the explanation!
Apologies if this is defined elsewhere already, but how would one go about creating a custom signature database? Or more specifically can we apply the known values from COSMIC to a different species?
For example SigProfilerMatrixGenerator supports dog (CanFam3.1), however downstream tools like SigProfilerExtractor do not fully support this. It looks like the limitation is when we are decomposing the de novo signatures, which is reliant on data files in the install directory. This makes sense based on what is available on COSMIC.
Noting a relevant code block that could be making some dangerous defaults for contexts other than SBS96: https://github.com/AlexandrovLab/SigProfilerExtractor/blob/b33d9d7101302eafb7d44379694979fa3e2fa7c4/SigProfilerExtractor/subroutines.py#L1095-L1140
Since the mutational signature profiles should be more or less the same across species, is there a script or tool for lifting these profiles across different genomes?