OpenTreeOfLife / taxomachine

taxonomy graphdb
Other
7 stars 4 forks source link

Show appropriate score when matching synonyms #61

Closed josephwb closed 10 years ago

josephwb commented 10 years ago

As it stands, the reported score of a query synonym name is calculated against the valid taxon name, not the synonym name in the DB. An example we are working with is "Sarcophaga jonesi", which is a synonym for "Fletcherimyia jonesi". When querying a typo of the synonym, "Sarcophaga jonesii", the following is returned (pruned here):

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/json" -d '{"queryString":"Sarcophaga jonesii","contextName":"All life"}' 
{
  "governing_code" : "undefined",
  "unambiguous_name_ids" : [ ],
  "unmatched_name_ids" : [ ],
  "matched_name_ids" : [ "Sarcophaga jonesii" ],
  "context" : "All life",
  "includes_deprecated_ids" : false,
  "includes_dubious_names" : false,
  "includes_approximate_matches" : true,
  "taxonomy" : {
    "author" : "open tree of life project",
    "weburl" : "https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy",
    "source" : "ott2.8"
  },
  "results" : [ {
    "id" : "Sarcophaga jonesii",
    "matches" : [ {
      "is_deprecated" : false,
      "dubious_name" : false,
      "flags" : [ ],
      "is_perfect_match" : false,
      "search_string" : "sarcophaga jonesii",
      "score" : 0.3333333333333333,
      "is_approximate_match" : true,
      "matched_ott_id" : 4370895,
      "matched_node_id" : 1774980,
      "rank" : "",
      "matched_name" : "Fletcherimyia jonesi",
      "unique_name" : "Fletcherimyia jonesi",
      "nomenclature_code" : "ICZN",
      "synonym_or_homonym_status" : "uncertain"
    }, {
      "is_deprecated" : false,
      "dubious_name" : false,
      "flags" : [ "SIBLING_HIGHER" ],
      "is_perfect_match" : false,
      "search_string" : "sarcophaga jonesii",
      "score" : 0.7647058823529411,
      "is_approximate_match" : true,
      "matched_ott_id" : 4370699,
      "matched_node_id" : 1777240,
      "rank" : "",
      "matched_name" : "Sarcophaga jamesi",
      "unique_name" : "Sarcophaga jamesi",
      "nomenclature_code" : "ICZN",
      "synonym_or_homonym_status" : "uncertain"
    }, {...(etc.)

The typo query "Sarcophaga jonesii" has a Levenshtein distance of 1 from the synonym "Sarcophaga jonesi", which is probably what the reported score should be based upon, not the Levenshtein distance of 12 between the query "Sarcophaga jonesii" and the valid taxon "Fletcherimyia jonesi".

chinchliff commented 10 years ago

This should be working on devapi (ot10)

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/json" -d '{"queryString":"Sarcophaga jonesii","contextName":"All life"}'

[clipped]
{
      "is_deprecated" : false,
      "dubious_name" : false,
      "is_synonym" : true,
      "flags" : [ ],
      "search_string" : "sarcophaga jonesii",
      "score" : 0.8823529411764706,
      "is_approximate_match" : true,
      "ot:ottId" : 4370895,
      "matched_node_id" : 1774980,
      "rank" : "",
      "matched_name" : "Sarcophaga jonesi",
      "unique_name" : "Fletcherimyia jonesi",
      "nomenclature_code" : "ICZN",
      "ot:ottTaxonName" : "Fletcherimyia jonesi"
    }
[clipped]