AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
101 stars 67 forks source link

De novo mutational signature extraction: next steps #818

Closed jaclyn-taroni closed 2 years ago

jaclyn-taroni commented 4 years ago

I am filing this issue to replace #636 with more detailed steps required for completing the de novo mutational signatures analysis and to surface some discussion on #799 (or other, already merged PRs). Note that this issue is too expansive in scope to be completed with a single pull request. It likely should be broken up into smaller issues with more detail, but in an effort to reduce the cognitive burden associated with tracking it exclusively in my head and to get some feedback, I am filing this one large issue to start.

The current state of mutational signatures

Right now, the de novo part of the mutational-signatures module extracts signatures from the WGS samples only for a range of number of signatures (k), using a low number of iterations. There is a script analyses/mutational-signatures/scripts/de_novo_signature_extraction.R that has command line options for the value(s) to use for k during extraction and the number of iterations.

What needs to happen next

Of the things I am currently aware of 😅

sjspielman commented 3 years ago

References useful for this analysis:

jaclyn-taroni commented 3 years ago

We used deconstructSigs in our initial approach, which is now analyses/mutational-signatures/01-known_signatures.Rmd (and is also what we described in the README because I neglected to update it 😬 ). The published signatures we used are the COSMIC signatures and Alexandrov et al, 2013 signatures.

sjspielman commented 3 years ago

@jaclyn-taroni It seems to me after some lit review that we really need to compare this approach with some of the newer probabilistic methods that are more suitable for small sample sizes, aka anything less than 1000 specimens. This approach is the gold standard, but it may not be right for us, so I will explore!

jaclyn-taroni commented 3 years ago

It seems to me after some lit review that we really need to compare this approach with some of the newer probabilistic methods that are more suitable for small sample sizes, aka anything less than 1000 specimens.

When you say this approach, which approach are you referencing? deconstructSigs or the method we use for de novo mutational signature currently (sigfit)?

Here's where sigfit gets applied currently: analyses/mutational-signatures/scripts/de_novo_signature_extraction.R. sigfit reference, which I neglected to link to in this issue: Gori and Baez-Ortega. bioRxiv.

sjspielman commented 3 years ago

"This approach" = a probabilistic approach. I haven't yet dug through the code associated with this analysis, so it sounds like sigfit is one of those! Sounds like we're already doing it; very nice to see when my thoughts match up with what we're doing. May end up comparing to signeR once I start really digging into analyzing.

sjspielman commented 3 years ago

Update on analysis, very much before filing any PRs:

So, how to proceed?

sjspielman commented 2 years ago

Closed with PRs:

974

1018

1100

1220

1227

1248