OpenTreeOfLife / taxomachine

taxonomy graphdb
Other
7 stars 4 forks source link

Weird return from contextQueryForNames #57

Closed josephwb closed 10 years ago

josephwb commented 10 years ago

Searching for "Pitta nipalensis" (which is a good taxon, but that is another issue). Curl call:

curl -X POST http://api.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/jso" -d '{"names":["Pitta nipalensis"], "contextName":"Birds"}'
{
  "governing_code" : "ICZN",
  "unambiguous_name_ids" : [ ],
  "unmatched_name_ids" : [ ],
  "matched_name_ids" : [ "Pitta nipalensis" ],
  "context" : "Birds",
  "includes_deprecated_ids" : false,
  "includes_dubious_names" : false,
  "taxonomy" : {
    "author" : "open tree of life project",
    "weburl" : "https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy",
    "source" : "ott2.8draft5"
  },
  "results" : [ {
    "id" : "Pitta nipalensis",
    "matches" : [ {
      "is_deprecated" : false,
      "dubious_name" : false,
      "flags" : [ ],
      "is_perfect_match" : false,
      "search_string" : "pitta nipalensis",
      "score" : 0.8125,
      "is_approximate_match" : true,
      "matched_ott_id" : 977743,
      "matched_node_id" : 2568920,
      "rank" : "",
      "matched_name" : "Cutia nipalensis",
      "unique_name" : "Cutia nipalensis",
      "nomenclature_code" : "ICZN",
      "synonym_or_homonym_status" : "uncertain"
    } ]
  } ]
}

Same response, whether I specify a context or not. Now, "Cutia nipalensis" is a bird, which is good (i.e. if not specifying context), but it is not related to my query taxon except for the identical specific epithet. The reason this is an issue is that the query taxon does appear in the OTT synonyms:

$ grep 'Pitta nipalensis' synonyms.tsv 
Pitta nipalensis    |   660685  |       |   Pitta nipalensis (synonym for Hydrornis nipalensis) |

Moreover, I can find no way that the two taxa might be confused:

$ grep 'Cutia nipalensis' synonyms.tsv 
cutia   |   977743  |       |   cutia (synonym for Cutia nipalensis)    |

Is taxomachine being too clever?

["Hydrornis nipalensis" is actually the synonym for the good taxon "Pitta nipalensis", but I'll file that elsewhere]

chinchliff commented 10 years ago

Yeah, this is probably related to a couple other bugs I've been trying to squash. Will add it to the list of test cases.

On Wednesday, July 30, 2014, Joseph W. Brown notifications@github.com wrote:

Searching for "Pitta nipalensis" (which is a good taxon, but that is another issue). Curl call:

curl -X POST http://api.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/jso" -d '{"names":["Pitta nipalensis"], "contextName":"Birds"}' { "governing_code" : "ICZN", "unambiguous_name_ids" : [ ], "unmatched_name_ids" : [ ], "matched_name_ids" : [ "Pitta nipalensis" ], "context" : "Birds", "includes_deprecated_ids" : false, "includes_dubious_names" : false, "taxonomy" : { "author" : "open tree of life project", "weburl" : "https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy", "source" : "ott2.8draft5" }, "results" : [ { "id" : "Pitta nipalensis", "matches" : [ { "is_deprecated" : false, "dubious_name" : false, "flags" : [ ], "is_perfect_match" : false, "search_string" : "pitta nipalensis", "score" : 0.8125, "is_approximate_match" : true, "matched_ott_id" : 977743, "matched_node_id" : 2568920, "rank" : "", "matched_name" : "Cutia nipalensis", "unique_name" : "Cutia nipalensis", "nomenclature_code" : "ICZN", "synonym_or_homonym_status" : "uncertain" } ] } ] }

Same response, whether I specify a context or not. Now, "Cutia nipalensis" is a bird, which is good (i.e. if not specifying context), but it is not related to my query taxon except for the identical specific epithet. The reason this is an issue is that the query taxon does appear in the OTT synonyms:

$ grep 'Pitta nipalensis' synonyms.tsv Pitta nipalensis | 660685 | | Pitta nipalensis (synonym for Hydrornis nipalensis) |

Moreover, I can find no way that the two taxa might be confused:

$ grep 'Cutia nipalensis' synonyms.tsv cutia | 977743 | | cutia (synonym for Cutia nipalensis) |

Is taxomachine being too clever?

["Hydrornis nipalensis" is actually the synonym for the good taxon "Pitta nipalensis", but I'll file that elsewhere]

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/57.

chinchliff commented 10 years ago

Ok, I think this is fixed on the development server. (sorry, used the production url for a previous comment, now deleted). In this case, the problem was just that the synonyms were missing from the db.

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/jso" -d '{"names":["Pitta nipalensis"], "contextName":"Birds"}'
{
  "governing_code" : "ICZN",
  "unambiguous_name_ids" : [ ],
  "unmatched_name_ids" : [ ],
  "matched_name_ids" : [ "Pitta nipalensis" ],
  "context" : "Birds",
  "includes_deprecated_ids" : false,
  "includes_dubious_names" : false,
  "taxonomy" : {
    "author" : "open tree of life project",
    "weburl" : "https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy",
    "source" : "ott2.8"
  },
  "results" : [ {
    "id" : "Pitta nipalensis",
    "matches" : [ {
      "is_deprecated" : false,
      "dubious_name" : false,
      "is_synonym" : true,
      "flags" : [ ],
      "is_perfect_match" : false,
      "search_string" : "pitta nipalensis",
      "score" : 1.0,
      "is_approximate_match" : false,
      "is_homonym" : false,
      "matched_ott_id" : 660685,
      "matched_node_id" : 3515400,
      "rank" : "",
      "matched_name" : "Hydrornis nipalensis",
      "unique_name" : "Hydrornis nipalensis",
      "nomenclature_code" : "ICZN",
      "synonym_or_homonym_status" : "known"
    } ]
  } ]
}
jar398 commented 10 years ago

Cody, could you do me a favor and add (maybe in brackets) the right synonym loading command to the server setup instructions we used last time:

https://docs.google.com/document/d/1JU3kXu7zQHG0ZriwkUIbZuR9QhTitykRKeV8L1491IU/edit

Thanks

On Tue, Aug 5, 2014 at 11:23 PM, Cody Hinchliff notifications@github.com wrote:

Ok, I think this is fixed on the development server. (sorry, used the production url for a previous comment, now deleted). In this case, the problem was just that the synonyms were missing from the db.

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/jso" -d '{"names":["Pitta nipalensis"], "contextName":"Birds"}'

{ "governing_code" : "ICZN", "unambiguous_name_ids" : [ ], "unmatched_name_ids" : [ ], "matched_name_ids" : [ "Pitta nipalensis" ], "context" : "Birds", "includes_deprecated_ids" : false, "includes_dubious_names" : false, "taxonomy" : { "author" : "open tree of life project", "weburl" : "https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy", "source" : "ott2.8"

}, "results" : [ { "id" : "Pitta nipalensis", "matches" : [ { "is_deprecated" : false, "dubious_name" : false,

  "is_synonym" : true,

  "flags" : [ ],
  "is_perfect_match" : false,
  "search_string" : "pitta nipalensis",

  "score" : 1.0,
  "is_approximate_match" : false,
  "is_homonym" : false,
  "matched_ott_id" : 660685,
  "matched_node_id" : 3515400,
  "rank" : "",
  "matched_name" : "Hydrornis nipalensis",
  "unique_name" : "Hydrornis nipalensis",
  "nomenclature_code" : "ICZN",
  "synonym_or_homonym_status" : "known"
} ]

} ]}

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51289560 .

chinchliff commented 10 years ago

Hey Jonathan, I'm not sure I understand. The synonyms are loaded during the initial taxomachine database building, which is done externally to server setup... At least, that was my understanding. The command is:

java -jar $TAXOMACHINE_JAR loadtaxsyn $OTT_SOURCENAME taxonomy.tsv synonyms.tsv $TAXOMACHINE_DB

On Fri, Aug 8, 2014 at 2:15 PM, Jonathan A Rees notifications@github.com wrote:

Cody, could you do me a favor and add (maybe in brackets) the right synonym loading command to the server setup instructions we used last time:

https://docs.google.com/document/d/1JU3kXu7zQHG0ZriwkUIbZuR9QhTitykRKeV8L1491IU/edit

Thanks

On Tue, Aug 5, 2014 at 11:23 PM, Cody Hinchliff notifications@github.com

wrote:

Ok, I think this is fixed on the development server. (sorry, used the production url for a previous comment, now deleted). In this case, the problem was just that the synonyms were missing from the db.

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/jso" -d '{"names":["Pitta nipalensis"], "contextName":"Birds"}'

{ "governing_code" : "ICZN", "unambiguous_name_ids" : [ ], "unmatched_name_ids" : [ ], "matched_name_ids" : [ "Pitta nipalensis" ], "context" : "Birds", "includes_deprecated_ids" : false, "includes_dubious_names" : false, "taxonomy" : { "author" : "open tree of life project", "weburl" : " https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy", "source" : "ott2.8"

}, "results" : [ { "id" : "Pitta nipalensis", "matches" : [ { "is_deprecated" : false, "dubious_name" : false,

"is_synonym" : true,

"flags" : [ ], "is_perfect_match" : false, "search_string" : "pitta nipalensis",

"score" : 1.0, "is_approximate_match" : false, "is_homonym" : false, "matched_ott_id" : 660685, "matched_node_id" : 3515400, "rank" : "", "matched_name" : "Hydrornis nipalensis", "unique_name" : "Hydrornis nipalensis", "nomenclature_code" : "ICZN", "synonym_or_homonym_status" : "known" } ] } ]}

— Reply to this email directly or view it on GitHub < https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51289560>

.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51638302 .

jar398 commented 10 years ago

Oh right. So did the database get rebuilt and redeployed? The last modified date in the downloads directory is July 23. Whose responsibility is this?

Jonathan

On Tue, Aug 12, 2014 at 12:22 PM, Cody Hinchliff notifications@github.com wrote:

Hey Jonathan, I'm not sure I understand. The synonyms are loaded during the initial taxomachine database building, which is done externally to server setup... At least, that was my understanding. The command is:

java -jar $TAXOMACHINE_JAR loadtaxsyn $OTT_SOURCENAME taxonomy.tsv synonyms.tsv $TAXOMACHINE_DB

On Fri, Aug 8, 2014 at 2:15 PM, Jonathan A Rees notifications@github.com

wrote:

Cody, could you do me a favor and add (maybe in brackets) the right synonym loading command to the server setup instructions we used last time:

https://docs.google.com/document/d/1JU3kXu7zQHG0ZriwkUIbZuR9QhTitykRKeV8L1491IU/edit

Thanks

On Tue, Aug 5, 2014 at 11:23 PM, Cody Hinchliff < notifications@github.com>

wrote:

Ok, I think this is fixed on the development server. (sorry, used the production url for a previous comment, now deleted). In this case, the problem was just that the synonyms were missing from the db.

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/jso" -d '{"names":["Pitta nipalensis"], "contextName":"Birds"}'

{ "governing_code" : "ICZN", "unambiguous_name_ids" : [ ], "unmatched_name_ids" : [ ], "matched_name_ids" : [ "Pitta nipalensis" ], "context" : "Birds", "includes_deprecated_ids" : false, "includes_dubious_names" : false, "taxonomy" : { "author" : "open tree of life project", "weburl" : " https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy", "source" : "ott2.8"

}, "results" : [ { "id" : "Pitta nipalensis", "matches" : [ { "is_deprecated" : false, "dubious_name" : false,

"is_synonym" : true,

"flags" : [ ], "is_perfect_match" : false, "search_string" : "pitta nipalensis",

"score" : 1.0, "is_approximate_match" : false, "is_homonym" : false, "matched_ott_id" : 660685, "matched_node_id" : 3515400, "rank" : "", "matched_name" : "Hydrornis nipalensis", "unique_name" : "Hydrornis nipalensis", "nomenclature_code" : "ICZN", "synonym_or_homonym_status" : "known" } ] } ]}

— Reply to this email directly or view it on GitHub <

https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51289560>

.

— Reply to this email directly or view it on GitHub < https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51638302>

.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51938502 .

chinchliff commented 10 years ago

That was likely me, I did rebuild/redeploy on ot10 around that date.

On Tue, Aug 12, 2014 at 12:43 PM, Jonathan A Rees notifications@github.com wrote:

Oh right. So did the database get rebuilt and redeployed? The last modified date in the downloads directory is July 23. Whose responsibility is this?

Jonathan

On Tue, Aug 12, 2014 at 12:22 PM, Cody Hinchliff notifications@github.com

wrote:

Hey Jonathan, I'm not sure I understand. The synonyms are loaded during the initial taxomachine database building, which is done externally to server setup... At least, that was my understanding. The command is:

java -jar $TAXOMACHINE_JAR loadtaxsyn $OTT_SOURCENAME taxonomy.tsv synonyms.tsv $TAXOMACHINE_DB

On Fri, Aug 8, 2014 at 2:15 PM, Jonathan A Rees < notifications@github.com>

wrote:

Cody, could you do me a favor and add (maybe in brackets) the right synonym loading command to the server setup instructions we used last time:

https://docs.google.com/document/d/1JU3kXu7zQHG0ZriwkUIbZuR9QhTitykRKeV8L1491IU/edit

Thanks

On Tue, Aug 5, 2014 at 11:23 PM, Cody Hinchliff < notifications@github.com>

wrote:

Ok, I think this is fixed on the development server. (sorry, used the production url for a previous comment, now deleted). In this case, the problem was just that the synonyms were missing from the db.

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/jso" -d '{"names":["Pitta nipalensis"], "contextName":"Birds"}'

{ "governing_code" : "ICZN", "unambiguous_name_ids" : [ ], "unmatched_name_ids" : [ ], "matched_name_ids" : [ "Pitta nipalensis" ], "context" : "Birds", "includes_deprecated_ids" : false, "includes_dubious_names" : false, "taxonomy" : { "author" : "open tree of life project", "weburl" : " https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy", "source" : "ott2.8"

}, "results" : [ { "id" : "Pitta nipalensis", "matches" : [ { "is_deprecated" : false, "dubious_name" : false,

"is_synonym" : true,

"flags" : [ ], "is_perfect_match" : false, "search_string" : "pitta nipalensis",

"score" : 1.0, "is_approximate_match" : false, "is_homonym" : false, "matched_ott_id" : 660685, "matched_node_id" : 3515400, "rank" : "", "matched_name" : "Hydrornis nipalensis", "unique_name" : "Hydrornis nipalensis", "nomenclature_code" : "ICZN", "synonym_or_homonym_status" : "known" } ] } ]}

— Reply to this email directly or view it on GitHub <

https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51289560>

.

— Reply to this email directly or view it on GitHub <

https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51638302>

.

— Reply to this email directly or view it on GitHub < https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51938502>

.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51941329 .

jar398 commented 10 years ago

OK, so the problem is not fixed in production. You updated the database on devapi on August 6, but api has not been updated. This would suggest that work on this study is still blocked. Someone still needs to copy the database from devapi to api.

Jonathan

On Tue, Aug 12, 2014 at 12:46 PM, Cody Hinchliff notifications@github.com wrote:

That was likely me, I did rebuild/redeploy on ot10 around that date.

On Tue, Aug 12, 2014 at 12:43 PM, Jonathan A Rees < notifications@github.com> wrote:

Oh right. So did the database get rebuilt and redeployed? The last modified date in the downloads directory is July 23. Whose responsibility is this?

Jonathan

On Tue, Aug 12, 2014 at 12:22 PM, Cody Hinchliff < notifications@github.com>

wrote:

Hey Jonathan, I'm not sure I understand. The synonyms are loaded during the initial taxomachine database building, which is done externally to server setup... At least, that was my understanding. The command is:

java -jar $TAXOMACHINE_JAR loadtaxsyn $OTT_SOURCENAME taxonomy.tsv synonyms.tsv $TAXOMACHINE_DB

On Fri, Aug 8, 2014 at 2:15 PM, Jonathan A Rees < notifications@github.com>

wrote:

Cody, could you do me a favor and add (maybe in brackets) the right synonym loading command to the server setup instructions we used last time:

https://docs.google.com/document/d/1JU3kXu7zQHG0ZriwkUIbZuR9QhTitykRKeV8L1491IU/edit

Thanks

On Tue, Aug 5, 2014 at 11:23 PM, Cody Hinchliff < notifications@github.com>

wrote:

Ok, I think this is fixed on the development server. (sorry, used the production url for a previous comment, now deleted). In this case, the problem was just that the synonyms were missing from the db.

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/jso" -d '{"names":["Pitta nipalensis"], "contextName":"Birds"}'

{ "governing_code" : "ICZN", "unambiguous_name_ids" : [ ], "unmatched_name_ids" : [ ], "matched_name_ids" : [ "Pitta nipalensis" ], "context" : "Birds", "includes_deprecated_ids" : false, "includes_dubious_names" : false, "taxonomy" : { "author" : "open tree of life project", "weburl" : " https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-Taxonomy",

"source" : "ott2.8"

}, "results" : [ { "id" : "Pitta nipalensis", "matches" : [ { "is_deprecated" : false, "dubious_name" : false,

"is_synonym" : true,

"flags" : [ ], "is_perfect_match" : false, "search_string" : "pitta nipalensis",

"score" : 1.0, "is_approximate_match" : false, "is_homonym" : false, "matched_ott_id" : 660685, "matched_node_id" : 3515400, "rank" : "", "matched_name" : "Hydrornis nipalensis", "unique_name" : "Hydrornis nipalensis", "nomenclature_code" : "ICZN", "synonym_or_homonym_status" : "known" } ] } ]}

— Reply to this email directly or view it on GitHub <

https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51289560>

.

— Reply to this email directly or view it on GitHub <

https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51638302>

.

— Reply to this email directly or view it on GitHub <

https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51938502>

.

— Reply to this email directly or view it on GitHub < https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51941329>

.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/57#issuecomment-51941740 .