Closed mkinnaman closed 3 years ago
Hi Michael--
Thank you for your interest! Any tree inference method can be used with our method, including treeomics. We used snv trees that had been published in prior work -- one dataset used CITUP and the other used their own probabilistic phylogenetic method. I added a link to our paper in the description so you can see more of the methodological details, including the citations for these datasets. Let use know if you have any additional questions as you explore our tool!
Best, Sarah
Thanks Sarah!
Similar to the example figure provided. I am getting the same breakdown of signatures for each clone. Im not sure that makes sense. Would figure different clones of the tree have different signature breakdown. What is the minimum number of mutations per clone to accurately identify signatures? Have seen 200 cited in other studies - appreciate your thoughts!
Michael
Sure, thing! Here are some of my thoughts --
First, I just want to make sure you noticed that you can control variation in signature breakdowns across clones by changing the parameter k. This parameter controls the number of clusters of nodes in the tree which have the same exposure profile. It can range from 1 to the number of nodes in the tree. For instance, in our tutorial we show results for k=1, which forces all nodes to have the same exposures. If you use k={# nodes in tree} then you would see different exposures at each node.
We also provide a model selection function getBIC, which helps the user select k without overfitting. If you are getting the same signatures for each clone based on this model selection function, it may be related to cancer type. In our experience, we found that PhySigs does not select k>1 in Ovarian cancer patients, which may be because this cancer is predominantly driven by structural variants and copy number aberrations. On the other hand, we saw PhySigs select k>1 in a lung cancer patient cohort.
Finally, the matrix factorization tool we use inside of PhySigs is deconstrutSigs. In this paper, they cite fewer than 50 mutations as triggering a warning for the tool. Based on that, I would think that fewer than 50 mutations is not reliable and potentially under 200 might still not be reliable. This will also partially depend on the number of mutational signatures you allow for. It is in part because of this concern that we made this PhySigs tool to cluster nodes and therefore increase sample size as compared to only estimating each node separately.
Sarah
Thanks so much for your thorough response. Just to confirm my understanding, you can run getBIC with different K values and the k-value which gives you the lowest getBIC value is the recommended value to avoid overfitting?
No problem! Yes, that is correct.
Hey - Sorry - one last question - if I wanted to update the cosmic signatures to include signature 35 from SBS 3.2 is there a way to do that?
Thanks!
As a follow up to this - I tried the following:
Installed deconstructsigs from github repo.
Edited physigs.R file to use "signatures.genome.cosmic.v3.may2019" as the signatures.ref file Edited Signatures of interest to match how they are listed in signatures.genome.cosmic.v3.may2019, IE: SBS1, SBS3...
When running: E_list <- allTreeExposures(T, P_Norm, S), I am now getting the following error:
Error in if (error < best_error) { : missing value where TRUE/FALSE needed
I wish I could just add a row to the original signatures.cosmic RDA file but I cant get the formatting right when I try to import an edited version.
Let me know what you think!
Ok - will close this out- I figured out how to edit/import an updated signatures file. I downloaded the cosmic 3.2 signatures and formatted them to be identical to the "signatures.cosmic.rda" file in the deconstructsigs repo. I forked that repo to my personal account and uploaded the new signatures.cosmic.rda file and reinstalled deconstructsigs from my repo. Everything now works with the updated signatures.
So if anyone wants to run this with cosmic 3.2 signatures - feel free to download deconstructsigs from my repo: https://github.com/mkinnaman/deconstructSigs/
Hi!
Looking forward to giving this tool a trial. What did you use to derive the initial snv trees? Would treeomics be adequate?
Thanks, Michael