Using revolver with phylowgs outputs

elifirem commented 6 years ago

Hi, If I'm not wrong it seems that we can use use phylowgs trees as inputs for revolver. I went through the wiki page but am still not sure which part of outputs of phylowgs can be used and how. Could you please provide guidance on this? Also my samples are not multi-regional but I have multiple samples on different time points. Would that be a problem? Thanks a lot! Irem

caravagn commented 6 years ago

Hi @elifirem,

REVOLVER can use a custom set of input trees that you have computed with PhyloWGS or any other tree-inference tool. You just need to use the revolver_compute_phylogeny

https://github.com/caravagn/revolver/blob/master/R/rev_cohort.R

which takes as input a list of adjacency matrices and scores (precomputed.trees and precomputed.scores).

If I well remember PhyloWGS computes a posterior over trees right with tree-stick breaking processes no? If so, I would use the top K trees scored by posterior likelihood (non-negative); this seems sensitive to me.

Concerning longitudinal vs cross-sectional, I've seen several papers doing that kind of analysis (subclonal deconvolution plus tree inference), thus I'd say the same assumptions of those analysis apply here. This does not seem to me an issue with the tool, and I think the real question is whether it is the best to assume the data are iid when we work with longitudinal biopsies.

Please let me know how it goes, I have never tested extensively this feature and I am curious to know if that works smooth. I will provide you with help otherwise.

Best

G

elifirem commented 6 years ago

Hi @caravagn, Thanks for prompt reply. Phylowgs reports possible trees and posterior likelihoods but they are all in json format to view in html. There is smc-het challange code which takes phylowgs outputs and writes the best tree's structure in .txt but I'm guessing it won't be enough for revolver and I actually need to provide other possible trees too, right? I can write a script to extract this information but could you give a brief example of how precomputed.trees and precomputed.scores files should like (column names etc) so I can start working on this. I'll let you know how it goes. Thanks! Irem

caravagn commented 6 years ago

Hi,

yes exactly you want several trees for each one of your patients so that the tool can decide which one are best across the whole cohort. I'd write a simple parser for json, or adapt the one that you mentioned to dump other than just the best tree.

The format to pass the tree structure is just an adjacency matrix, and the score is a real value (higher score, better tree). So the adjacency matrix M is a N x N matrix representing N nodes, for the N clones estimated by PhyloWGS. Columns and rows are named after the clone's ids; so 1s represent edges in the tree.

This means that in your data, the column cluster is filled in with the corresponding PhyloWGS clustering assignments. Drivers must be annotated as usual, and that's independent on the fact that you want to use PhyloWGS to create input trees.

caravagnalab / revolver

Using revolver with phylowgs outputs #20