evoinfo / miapa

Minimum Information About a Phylogenetic Analysis (MIAPA) vocabularies and tools
http://www.evoio.org/wiki/MIAPA
Creative Commons Zero v1.0 Universal
9 stars 7 forks source link

The question about rooting needs to be more explicit. #23

Open jar398 opened 10 years ago

jar398 commented 10 years ago

The checklist says "Is topology rooted or not?" but what one really needs to know is whether the root implied by the representation is known (or believed) to be the biologically meaningful root. That is, some representations are either unrooted or are documented to give no meaning to the representation-level root, so if that root is the "real" one this fact needs to be signalled. Similarly, @root=true on a node in NeXML does not imply that the specified node is a biological root.

The true root might even be known and known not to be the root of the tree in the representation, so the real question ought to be, "If known, which node is the biological root"? Or if you want to rule that out possibility, the question ought to be "Is the representation-level root the biological root"? yes/no/unknown. Or something like that.

hlapp commented 10 years ago

@jar398 could you add here, too, what Open Tree has tentatively chosen to do to support this use case?

jar398 commented 10 years ago

I think we're labeling a node as the "designated root" (meaning semantic root), if we know it, and it may or may not be the root at the representation level. If you find a tree with a node labeled in this way the first thing you'd probably do is reroot it (at the representation level) so that this node is the root. If it's not there all bets are off. @root is of no value because it's so often meaningless - really it just means (I suspect) that there are no edges going into it, but that's redundant since you could just look at the topology to figure that out.

arlin commented 10 years ago

On Dec 16, 2013, at 10:54 AM, Jonathan A Rees wrote:

The checklist says "Is topology rooted or not?" but what one really needs to know is whether the root implied by the representation is known (or believed) to be the biologically meaningful root. That is, some representations are either unrooted or are documented to give no meaning to the representation-level root, so if that root is the "real" one this fact needs to be signalled. Similarly, @root=true on a node in NeXML does not imply that the specified node is a biological root

As someone who was present (virtually) at the meeting, I'm pretty sure that this is exactly what is intended by the question of whether the tree is rooted.
The true root might even be known and known not to be the root of the tree in the representation, so the real question ought to be, "If known, which node is the biological root"? Or if you want to rule that out possibility, the question ought to be "Is the representation-level root the biological root"? yes/no/unknown. Or something like that.

I think this is heading in the right direction. In effect, you have split the concept of root into 2 concepts, "biological root" and the apparent or "representation-level" root implied in a particular rendering of a tree-concept as a graphic image or a Newick string. One problem with this is that "biological root" is a neologism.

This doesn't help make the situation any easier, but I would suggest that we are always juggling with three different concepts of tree -- (1) an underlying reality of evolutionary history (in more technical language, the time-evolution of a system) that typically is only inferred; (2) a tree-concept that is a model of the course of history; and (3) a rendering of the tree-concept as an image or a Newick string or a serialization such as NEXUS. We are always looking at a tree rendering, not a tree-concept, but we can translate between renderings because we have a common tree-concept. This tree-concept is not the same as the actual history (and "phylogeny" originally implied, not just a tracing of relationships, but a complete story of the development of a taxon). In most cases, it is an estimate of the history.

Arlin

Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850 tel: 240 314 6208; web: www.molevol.org

jar398 commented 10 years ago

On Dec 17, 2013, at 5:03 PM, Arlin Stoltzfus wrote:

On Dec 16, 2013, at 10:54 AM, Jonathan A Rees wrote:

The checklist says "Is topology rooted or not?" but what one really needs to know is whether the root implied by the representation is known (or believed) to be the biologically meaningful root. That is, some representations are either unrooted or are documented to give no meaning to the representation-level root, so if that root is the "real" one this fact needs to be signalled. Similarly, @root=true on a node in NeXML does not imply that the specified node is a biological root

As someone who was present (virtually) at the meeting, I'm pretty sure that this is exactly what is intended by the question of whether the tree is rooted.

Probably so, but (a) the NeXML documentation isn't clear and (b) intent doesn't matter because I am informed that this is not how Treebase uses it. Jonathan

The true root might even be known and known not to be the root of the tree in the representation, so the real question ought to be, "If known, which node is the biological root"? Or if you want to rule that out possibility, the question ought to be "Is the representation-level root the biological root"? yes/no/unknown. Or something like that.

I think this is heading in the right direction. In effect, you have split the concept of root into 2 concepts, "biological root" and the apparent or "representation-level" root implied in a particular rendering of a tree-concept as a graphic image or a Newick string. One problem with this is that "biological root" is a neologism.

This doesn't help make the situation any easier, but I would suggest that we are always juggling with three different concepts of tree -- (1) an underlying reality of evolutionary history (in more technical language, the time-evolution of a system) that typically is only inferred; (2) a tree-concept that is a model of the course of history; and (3) a rendering of the tree-concept as an image or a Newick string or a serialization such as NEXUS. We are always looking at a tree rendering, not a tree-concept, but we can translate between renderings because we have a common tree-concept. This tree-concept is not the same as the actual history (and "phylogeny" originally implied, not just a tracing of relationships, but a complete story of the development of a taxon). In most cases, it is an estimate of the history.

Arlin

Arlin Stoltzfus (arlin@umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850 tel: 240 314 6208; web: www.molevol.org — Reply to this email directly or view it on GitHub.