NPSDC / beaveR

3 stars 0 forks source link

Clarification of beaveR output #1

Open linranzhou opened 7 months ago

linranzhou commented 7 months ago

After running TreeTerminus on the samples in my RNA-seq experiment and obtaining the consensus tree, I am using beaveR to parse that output. I was hoping to clarify some details about the output of beaveR and the groups in the cuts that it provides when you have it solve for cuts.

As a sort of first pass, I found the groups associated with the objective function that minimizes mean infRV and height by following the example in the documentation. In objS[["cut"]], I get a list of numbers of nodes. Many of these, when I look them up using findNodeInformation() are leaves, with output that looks like this:

                        nodeInd  meanInfRV    genes nodeType
ENST00000665671.1   56351      0.01 ENSG0000....     Leaf

However, some, when I look up, produce results like the following:

nodeInd meanInfRV        genes  nodeType
Node109999         109999 0.1168962 ENSG0000.... InnerNode
Node110000         110000 0.2080091 ENSG0000.... InnerNode
ENST00000565112.1   56355 0.5801088 ENSG0000....      Leaf

When I look at this dataframe that is produced, I see, under genes, that the gene is ENSG00000155330.10, for which ENST00000565112.1 is a transcript. If I look at assays[["counts"]] for the object made through buildTSE, I can see counts for Node109999, Node110000, and ENST00000565112.1. Using the groups produced by solveForOptimalCut, one inferential unit for differential expression analysis would be ENST00000665671.1 and another would be Node109999. That seems all well and good, and I can trim down the matrix of counts to the row corresponding to ENST00000665671.1 and Node109999.

However, using Node109999 as an example, are there other transcripts aggregated together other than ENST00000565112.1, the transcript explicitly listed as a leaf? If so, how can I see which other transcripts are aggregated together? Thank you so much for your time.

NPSDC commented 7 months ago

I apologize for the late response, I just saw this. I am unsure what you mean by "when I look up". So as the help suggests, findNodeInformation provides node information. If you want to see all the underlying transcripts of a given node, set type = "tips". On the other hand, if you want to see only the child nodes, set type="children.

So in your example Node10999, if you set type="tips", you would get all underlying transcripts that form that node.