OpenTreeOfLife / feedback

No code -- just an issue tracker for general feedback (sent here via GitHub's issues API)
1 stars 0 forks source link

Bug with node id for fossil taxa? - doc needed #63

Closed fmichonneau closed 8 years ago

fmichonneau commented 9 years ago

If I do tnrs/match_names on a fossil taxa, the node id it returns refers to a bacterial taxa if I query this node with graph/node_info, is this normal?

I have tried a few fossils and it seems to always be the case.

curl -X POST http://api.opentreeoflife.org/v2/tnrs/match_names \
> -H "content-type:application/json" -d \
> '{"names": ["Tyrannosaurus rex"]}'
{
  "governing_code" : "ICZN",
  "unambiguous_name_ids" : [ "Tyrannosaurus rex" ],
  "unmatched_name_ids" : [ ],
  "matched_name_ids" : [ "Tyrannosaurus rex" ],
  "context" : "Tetrapods",
  "includes_deprecated_taxa" : false,
  "includes_dubious_names" : false,
  "includes_approximate_matches" : true,
  "taxonomy" : {
    "weburl" : "https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy",
    "author" : "open tree of life project",
    "source" : "ott2.8"
  },
  "results" : [ {
    "id" : "Tyrannosaurus rex",
    "matches" : [ {
      "matched_node_id" : 3499538,
      "synonyms" : [ "Tyrannosaurus rex" ],
      "flags" : [ "EXTINCT_DIRECT", "EXTINCT_INHERITED" ],
      "ot:ottTaxonName" : "Tyrannosaurus rex",
      "search_string" : "tyrannosaurus rex",
      "matched_name" : "Tyrannosaurus rex",
      "is_synonym" : false,
      "score" : 1.0,
      "unique_name" : "Tyrannosaurus rex",
      "ot:ottId" : 664349,
      "is_deprecated" : false,
      "nomenclature_code" : "ICZN",
      "is_approximate_match" : false,
      "rank" : "",
      "is_dubious" : false
    } ]
  } ]
}

Using node 3499538

curl -X POST http://api.opentreeoflife.org/v2/graph/node_info -H "content-type:application/json" -d '{"node_id": 3499538}'
{
  "in_graph" : true,
  "tree_id" : "opentree3.0",
  "name" : "Cupriavidus sp. RMp3122",
  "rank" : "species",
  "ott_id" : 5237036,
  "num_tips" : 1,
  "tree_sources" : [ ],
  "tax_source" : "ncbi:1235258",
  "synth_sources" : [ {
    "git_sha" : "",
    "tree_id" : "",
    "study_id" : "taxonomy"
  } ],
  "node_id" : 3499538,
  "in_synth_tree" : true,
  "num_synth_children" : 1
}
jar398 commented 9 years ago

I know it is confusing but we have two different kinds of ids, node ids and taxon ids, and they are completely unrelated. The node ids are temporary tags tied to data structures inside the treemachine and taxomachine neo4j databases; the node ids are not stable from one synthetic tree build to the next, and are not even the same between the two databases. It is an architectural mistake that these are visible to users at all. The taxon ids (ottids) refer to taxa (or to ideas of same) and are stable from one taxonomy release to the next. You want to take the number 664349 in the TNRS result and feed that as the 'ott_id' (not 'node_id') parameter to the node_info API call.

fmichonneau commented 9 years ago

OK, I figured it was something like this.

Is there a documentation of node ids are somewhere? We are almost done with the R package that interfaces with the API and it would be great to point users to this kind of information.

I will also make sure to hide node ids from the users as much as possible so they are tempted to use them.

jar398 commented 9 years ago

Agree, this should be documented, since fixing it will take a while. Will try to get to it soon.

jar398 commented 8 years ago

I think that with the switch to synthesis with propinquity, the difference between OTT ids and node ids will be self-evident. (Node ids for taxonomy nodes are 'ott' followed by an OTT id; node ids begin with 'mrca'; OTT ids are numbers.)