Open cmungall opened 9 years ago
cc @ANiknejad
I think all NOTs should be pairs of taxa.
It would be incorrect. Consider for instance tapetum lucidum, or nictitating membrane, which appeared independently in more than two lineages.
The crucial information here is the taxon of the studied organism, Ciona, vs (presumably) vertebrates
If you want to determine in which taxa a structure independently appeared (and so, between which taxa the homology hypothesis is rejected), as you said you must look at positive annotations, mapped to a sub-taxon of the NOT annotation. There can be more than two.
It is true that in your autopod example, we only captured the positive annotation at the tetrapoda level. This should be corrected, see issue #5.
Summing these up to a taxon that subsumes the pair is not wrong, it's just less useful.
Well, it is the intent of the NOT annotation, to reject a hypothesis that could otherwise seem plausible: by naively looking at the phylogenetic distribution, you would infer an homology hypothesis for the taxon that subsumes all taxa with the structure.
For example, if we have expression data about coelocanths and mammals we can test the hypothesis. But it would be wrong to test the hypothesis by comparing mouse and human.
When comparing species, it is needed:
A compromise would be to narrow the bracket. E.g. for the first example, using Sarcopterygii would be better
Fixing issue #5 will solve this problem.
So, I think the NOT annotations are correctly designed, unless you see another problem. Maybe I could generate some derived files that could help you?
I also realize that sometimes there is no sub-taxon positive annotation. For instance, a structure is originally though to originate in vertebrata, but it then showed to originate in tetrapoda: we will add a NOT annotation at the vertebrata level. There is no pair of taxon nor sub-taxon to consider.
There can be several sub-taxon annotations only in cases of independent evolution. Otherwise, the NOT annotation is only used to capture the rejection of a previous hypothesis.
I think you're using NOT annotations in a way they're not supposed to work. Isn't it simply the common ancestral taxon/taxa of each structure that you want to retrieve?
Currently NOT annotations use a broad taxonomic grouping, but don't actually indicate what groups the negative statement pertains to. For example:
Naively we may assume that the autopod is never homologous in vertebrates, but this would be wrong. What the authors are saying IMO is that they disbelieve there is homology between the tetrapod autopod and the autopods of other Sarcopterygii.
Similarly:
The crucial information here is the taxon of the studied organism, Ciona, vs (presumably) vertebrates. This could be broadened to all tunicates.
I think all NOTs should be pairs of taxa. E.g. for the first Coelacanthimorpha|Tetrapoda, for the second Tunicata|Vertebrata
Summing these up to a taxon that subsumes the pair is not wrong, it's just less useful. For example, if we have expression data about coelocanths and mammals we can test the hypothesis. But it would be wrong to test the hypothesis by comparing mouse and human.
A compromise would be to narrow the bracket. E.g. for the first example, using Sarcopterygii would be better, as we can test the negative hypothesis by examing pairs of taxa immediately under this taxon. But it would still be better to reflect the statement of the authors more directly.