gaurav / taxrefine

TaxRefine: OpenRefine utilities for taxon name validation
8 stars 0 forks source link

Odd suggestions + no GBIF backbone taxonomy #17

Open peterdesmet opened 9 years ago

peterdesmet commented 9 years ago

Example

  1. Reconcile Aeshna mixta Latreille, 1805 with http://refine.taxonomics.org/gbifchecklists/reconcile
  2. We get 3 suggestions:

    screen shot 2015-11-09 at 12 11 32

  3. First suggestion (supported by 8 checklists) links to http://www.gbif.org/species/113856503, with NO supporting checklists.
  4. Second suggestion (supported by 5 checklists) links to http://www.gbif.org/species/102005563, with 13 supporting checklists. The page also links to the species on the GBIF backbone taxonomy: http://www.gbif.org/species/1425177
  5. Third suggestion (supported by 1 checklist) links to http://www.gbif.org/species/103188393, with 16 supporting checklists, but has no higher classification. The page also links to the species on the GBIF backbone taxonomy: http://www.gbif.org/species/1425177

    Questions

  6. Why are the number of supported checklists different from the number of checklists in which the name actually appears?
  7. Why is the species from the GBIF backbone taxonomy not suggested? It has a higher classification + 16 supporting datasets and provides probably the most useful cell.recon.match.id to use for adding higher classification later.
gaurav commented 9 years ago

Thanks for reporting this, @peterdesmet! I'm in a bit of time crunch at the moment, but I should be able to have a look at this before the weekend.

For future me: here is the incorrect query -- http://refine.taxonomics.org/gbifchecklists/reconcile?query=Aeshna%20mixta%20Latreille,%201805

gaurav commented 8 years ago

I think I've figured this out:

  1. TaxRefine first looks for a verbatim name (using /species?name=$name), which it can't find because the query includes the scientific name. Since it can't find it, it switches to a full-text search (using /species/search?name=$name), which retrieves a total of thirteen matches. It then summarizes these down to the 3 matches you've seen, based on the four keys it uses to summarize results (canonical name, accepted name, authority and kingdom). I'm not sure how the checklist lists on the GBIF website work -- I suspect it searches for all taxon concepts in GBIF with the same canonical name -- but there's no reason to assume it's going to be grouped similarly in TaxRefine unless each group has the same values for each of the four keys. So, I don't think this is a bug, but if you know how to access GBIF's same-taxon-concept-as data, I'd be happy to try to build that into TaxRefine!
  2. It actually is, but TaxRefine was picking the newest GBIF ID to represent the entire summarized entry. I've modified it in @bb296057f to pick the oldest GBIF ID, which is likely to be the GBIF backbone taxonomy (it works correctly in your example). Let me know if that works better for you on other searches, otherwise I'll just bite the bullet and modify it to pick the GBIF backbone taxonomy (checklist d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) above any other.

I haven't deployed these changes yet -- I'm currently on a university guest wireless account which won't let me use SSH, but once I get that I'll update TaxRefine and add a comment here. Let me know if that improves things for you, otherwise please let me know how I could make TaxRefine better!

gaurav commented 8 years ago

TaxRefine has been updated to the latest version!

peterdesmet commented 8 years ago

Thanks! Not sure when I'll be able to test this again.

I guess you are aggregating the checklists to avoid a long list of suggestions, but it would give some more control to the user if he/she could reconcile with a preferred checklist (e.g. by selecting a preferred checklist beforehand) and if not found there use other checklists. That way one can describe in methods the taxa have been reconciled with checklist x, then y, then z.