OpenTreeOfLife / treemachine

Source tree graph database
Other
16 stars 6 forks source link

incorrect association of a node in the synthetic tree: wrong taxonomic node #154

Closed mtholder closed 8 years ago

mtholder commented 9 years ago

https://tree.opentreeoflife.org/opentree/argus/otol.draft.22@860705/Halicnemia shows up labelled as Halicnemia_ott776728 but it actually has the taxonomic composition of the taxonomic parent: Heteroxyidae_ott668403 see curl calls below.

requesting the genus:

$ curl -X POST -H "Content-Type":"application/json" -H "Accept":"application/json" http://api.opentreeoflife.org/v2/taxonomy/subtree --data '{"ott_id": 776728}' 

gives a subset:

{
  "subtree" : "(Halicnemia_sp._BELUM<GBR>_Mc5427_ott4939607,Halicnemia_sp._BELUM<GBR>_Mc4307_ott4939608,Halicnemia_papillosa_ott2835824,Halicnemia_diazae_ott2835823,Halicnemia_geniculata_ott2835822,Halicnemia_salomonensis_ott2835821,Halicnemia_arcuata_ott2835820,Halicnemia_patera_ott776726,Halicnemia_sp._A_CM-2010_ott145568)Halicnemia_ott776728;"
}

of what you get when you ask for the synth tree:

$ curl -X POST -H "Content-Type":"application/json" -H "Accept":"application/json" http://api.opentreeoflife.org/v2/tree_f_life/subtree --data '{"node_id": "860705", "tree_id": "otol.draft.22"}'

. Note the newick field and its root label:

{
  "newick" : "((Parahigginsia_phakelloides_ott2835918,Parahigginsia_phakellioides_ott2835865)Parahigginsia_ott2835866,(Alloscleria_tenuispinosa_ott2835914)Alloscleria_ott2835915,(Negombo_kellyae_ott2835911,Negombo_jogashimensis_ott2835936,Negombo_acanthosanidastera_ott2835862,Negombo_tenuistellata_ott2835858)Negombo_ott2835859,(Acanthoclada_prostrata_ott2835939)Acanthoclada_ott2835931,(Julavis_jamaicensis_ott2835881,Julavis_levis_ott2835880)Julavis_ott2835854,(Microxistyla_petrina_ott2835896)Microxistyla_ott2835867,(Heteroxya_corticata_ott2835861,Heteroxya_sp_1_CM-2013_ott5223187)Heteroxya_ott2835860,(Desmoxya_lunata_ott2835932,Desmoxya_pelagiae_ott2835917)Desmoxya_ott2835871,Halicnemia_arcuata_ott2835820,Halicnemia_sp_A_CM-2010_ott145568,Halicnemia_salomonensis_ott2835821,Halicnemia_sp_BELUM_GBR_Mc4307_ott4939608,Halicnemia_sp_BELUM_GBR_Mc5427_ott4939607,Halicnemia_geniculata_ott2835822,Halicnemia_patera_ott776726,Halicnemia_papillosa_ott2835824,Halicnemia_diazae_ott2835823)Halicnemia_ott776728;",
  "tree_id" : "otol.draft.22"
}

That tree is the same tree structure as the parent taxon

$ curl -X POST -H "Content-Type":"application/json" -H "Accept":"application/json" http://api.opentreeoflife.org/v2/taxonmy/subtree --data '{"ott_id": 668403}'

which results:

{
  "subtree" : "((Heteroxya_sp._1_CM-2013_ott5223187,Heteroxya_corticata_ott2835861)Heteroxya_ott2835860,(Negombo_acanthosanidastera_ott2835862,Negombo_tenuistellata_ott2835858,Negombo_jogashimensis_ott2835936,Negombo_kellyae_ott2835911)Negombo_ott2835859,(Desmoxya_lunata_ott2835932,Desmoxya_pelagiae_ott2835917)Desmoxya_ott2835871,(Microxistyla_petrina_ott2835896)Microxistyla_ott2835867,(Parahigginsia_phakellioides_ott2835865,Parahigginsia_phakelloides_ott2835918)Parahigginsia_ott2835866,(Julavis_levis_ott2835880,Julavis_jamaicensis_ott2835881)Julavis_ott2835854,(Acanthoclada_prostrata_ott2835939)Acanthoclada_ott2835931,(Alloscleria_tenuispinosa_ott2835914)Alloscleria_ott2835915,(Halicnemia_sp._BELUM<GBR>_Mc5427_ott4939607,Halicnemia_sp._BELUM<GBR>_Mc4307_ott4939608,Halicnemia_papillosa_ott2835824,Halicnemia_diazae_ott2835823,Halicnemia_geniculata_ott2835822,Halicnemia_salomonensis_ott2835821,Halicnemia_arcuata_ott2835820,Halicnemia_patera_ott776726,Halicnemia_sp._A_CM-2010_ott145568)Halicnemia_ott776728)Heteroxyidae_ott668403;"
  }
mtholder commented 9 years ago

From a new pull from NCL and then running:

$ example/check-taxo-nodes/checktaxonnodes -frelaxedphyliptree ../../draftversion2.tre ../Taxonomy.tre >out 2>err

I find 22 cases of named internal nodes in the synthetic tree with leaf sets that differ from the definition of that name in OTT:

ott1026107
ott103935
ott1051412
ott172860
ott197414
ott2840942
ott327448
ott389506
ott411973
ott411975
ott411977
ott438716
ott4795965
ott5247549
ott5424357
ott605182
ott776728
ott803358
ott890366
ott938413
ott966429
ott99242

I have not checked these manually (yet).

mtholder commented 9 years ago

Looks like 19 are cases of getting a taxonomic content of a node in OTT but matching the wrong name:

    Found identical leaf sets for the synthetic tree "Chlamydiales ott966429" and the taxonomic node "Chlamydiae ott370886".
    Found identical leaf sets for the synthetic tree "Acidobacteria ott1051412" and the taxonomic node "Acidobacteria sup ott952528".
    Found identical leaf sets for the synthetic tree "Dehalococcoidaceae ott438716" and the taxonomic node "Dehalococcoidia ott346927".
    Found identical leaf sets for the synthetic tree "Aegilops speltoides ott605182" and the taxonomic node "Aegilops ott267024".
    Found identical leaf sets for the synthetic tree "Aegilops speltoides subsp speltoides ott327448" and the taxonomic node "Aegilops ott267024".
    Found identical leaf sets for the synthetic tree "Cycadaceae ott99242" and the taxonomic node "Cycadales ott614464".
    Found identical leaf sets for the synthetic tree "Bacillariophytina ott5247549" and the taxonomic node "Bacillariophyta ott5342311".
    Found identical leaf sets for the synthetic tree "Halicnemia ott776728" and the taxonomic node "Heteroxyidae ott668403".
    Found identical leaf sets for the synthetic tree "Tetrapocillon ott2840942" and the taxonomic node "Guitarridae ott2840923".
    Found identical leaf sets for the synthetic tree "Trikentrion ott172860" and the taxonomic node "Cyamoninae ott172859".
    Found identical leaf sets for the synthetic tree "Columbidae ott938413" and the taxonomic node "Columbiformes ott363030".
    Found identical leaf sets for the synthetic tree "Kurtoidei ott411975" and the taxonomic node "Gobiomorpharia ott5553755".
    Found identical leaf sets for the synthetic tree "Kurtidae ott411977" and the taxonomic node "Gobiomorpharia ott5553755".
    Found identical leaf sets for the synthetic tree "Kurtus ott411973" and the taxonomic node "Gobiomorpharia ott5553755".
    Found identical leaf sets for the synthetic tree "Peristediidae ott803358" and the taxonomic node "Triglioidei ott5557288".
    Found identical leaf sets for the synthetic tree "Rathbunella ott197414" and the taxonomic node "Bathymasteridae ott300544".
    Found identical leaf sets for the synthetic tree "Diapriidae ott890366" and the taxonomic node "Proctotrupoidea ott483914".
    Found identical leaf sets for the synthetic tree "Eumunida ott389506" and the taxonomic node "Chirostylidae ott389507".
    Found identical leaf sets for the synthetic tree "Marine Group I ott4795965" and the taxonomic node "Thaumarchaeota ott102415".

the other 3 are:

    Could not find this set of leaves in the synth "Mycale ott1026107" in any taxonomic node.
    Could not find this set of leaves in the synth "Higginsia ott103935" in any taxonomic node.
    Could not find this set of leaves in the synth "Dorylaimia ott5424357" in any taxonomic node.

(once again I have not manually checked these yet).

mtholder commented 9 years ago

I just confirmed that Mycale_titubans_ott403492 is not in https://tree.opentreeoflife.org/opentree/argus/ottol@1026107/Mycale-genus-ncbi-86015-in-family-Mycalidae-

The taxonomy has no name for a subgroup of Mycale_ott1026107 which excluded only Mycale_titubans_ott403492

So this could be a different bug than the one described earlier in the thread involving Heteroxyidae_ott668403.

We may want to separate this into 2 different issues.

mtholder commented 9 years ago

reference in https://groups.google.com/d/msg/opentreeoflife/UIjmJpmYWzw/dXd-EeZ-SLwJ see more recent (v3 of synth) output at http://phylo.bio.ku.edu/ot/opentree3.0-check-taxonomic-nodes-output.txt

josephwb commented 9 years ago

Should be fixed as of 3cba417a8f97f86de5224ad16b8c2bcad527acb9. Will know for sure shortly :wink:

chinchliff commented 9 years ago

Any update on this? It is referenced in the doc and I am curious if we have confirmation of Joseph's statement that it should be fixed

mtholder commented 9 years ago

his fix dealt with all but 2 cases (that or there is a bug in the otcetera code that identifies problems).

josephwb commented 9 years ago

I haven't chased down whats what with this. We turned down verbosity recently in the synth log; I'll see if the info is in there. Weirdly, there are no problems when run with individual clades (Embryophytes, Metazoa, etc.) but appear in the big analysis.

mtholder commented 9 years ago

The version 3 of synth is down to 2 cases. One is at the species level: https://devtree.opentreeoflife.org/opentree/argus/opentree3.0@3678897 which excludes: https://devtree.opentreeoflife.org/opentree/opentree3.0@3854155/Arthronema-gygaxiana--Limnothrix-redekei-NIVA-CYA-227/1

that latter grouping is odd. If you back up to https://devtree.opentreeoflife.org/opentree/opentree3.0@3855022 you see that it currently lists no studies supporting it. (I know that https://github.com/OpenTreeOfLife/treemachine/issues/180 is probably more relevant to that facet of this bug).

mtholder commented 9 years ago

The other mislabeled node is: https://devtree.opentreeoflife.org/opentree/argus/opentree3.0@927983 which is an ancestor of https://devtree.opentreeoflife.org/opentree/argus/opentree3.0@934222 and: https://devtree.opentreeoflife.org/opentree/argus/opentree3.0@934227

which together make the top clade of https://devtree.opentreeoflife.org/opentree/opentree3.0@3870811/Cyclopteridae--Trachinoidei--

josephwb commented 9 years ago

I implemented my missing-children fix for "Limnothrix redekei ott704052" and re-ran synthesis (i.e. the full analysis). The problem went away (and didn't generate new ones!). The "Cottioidei ott237343" issue remains (as do the 9 unsupported nodes).

chinchliff commented 9 years ago

Oh. Well that's good. So maybe we can resolve this issue and remove the associated disclaimer from the manuscript?

On Wed, May 20, 2015 at 9:07 AM Joseph W. Brown notifications@github.com wrote:

I implemented my missing-children fix for "Limnothrix redekei ott704052" and re-ran synthesis. The problem went away (and didn't generate new ones!). The "Cottioidei ott237343" issue remains (as do the 9 unsupported nodes).

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/154#issuecomment-103877619 .

chinchliff commented 9 years ago

Actually, I suppose not, since the issue with Cottioideae remains. We know it isn't missing children. I wonder what else could be happening that would affect taxonomic composition like that?

On Wed, May 20, 2015 at 9:27 AM Cody Hinchliff cody.hinchliff@gmail.com wrote:

Oh. Well that's good. So maybe we can resolve this issue and remove the associated disclaimer from the manuscript?

On Wed, May 20, 2015 at 9:07 AM Joseph W. Brown notifications@github.com wrote:

I implemented my missing-children fix for "Limnothrix redekei ott704052" and re-ran synthesis. The problem went away (and didn't generate new ones!). The "Cottioidei ott237343" issue remains (as do the 9 unsupported nodes).

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/154#issuecomment-103877619 .

josephwb commented 9 years ago

I don't know about the unsupported nodes. Could come from adding missing children. Unfortunately, I cannot do the unsupported test with the partial synth tree, partial taxonomy tree, and pruned inputs: something goes wrong with pruning. Maybe @mtholder can take a look?