gbif / checklistbank

GBIF Checklist Bank
Apache License 2.0
31 stars 14 forks source link

Regression for incertae sedis (Carex vulpina) #274

Closed ahahn-gbif closed 1 year ago

ahahn-gbif commented 1 year ago

Similar to #272, a plant (Carex vulpina agg.) mapped into butterflies due to name similarity (Carea vulpina Warren, 1912) plus missing higher taxonomy

{
  "count": 4800,
  "verbatim_kingdom": "null",
  "verbatim_phylum": "null",
  "verbatim_class": "null",
  "verbatim_order": "null",
  "verbatim_family": "null",
  "verbatim_genus": "null",
  "verbatim_species": "null",
  "verbatim_infra": "null",
  "verbatim_rank": "null",
  "verbatim_verbatimRank": "null",
  "verbatim_scientificName": "Carex vulpina agg.",
  "verbatim_generic": "null",
  "verbatim_author": "null",
  "current_kingdom": "incertae sedis",
  "current_phylum": "null",
  "current_class": "null",
  "current_order": "null",
  "current_family": "null",
  "current_genus": "null",
  "current_subGenus": "null",
  "current_species": "null",
  "current_scientificName": "incertae sedis",
  "current_acceptedScientificName": "null",
  "current_kingdomKey": 0,
  "current_phylumKey": "null",
  "current_classKey": "null",
  "current_orderKey": "null",
  "current_familyKey": "null",
  "current_genusKey": "null",
  "current_subGenusKey": "null",
  "current_speciesKey": "null",
  "current_taxonKey": 0,
  "current_acceptedTaxonKey": "null",
  "proposed_kingdom": "Animalia",
  "proposed_phylum": "Arthropoda",
  "proposed_class": "Insecta",
  "proposed_order": "Lepidoptera",
  "proposed_family": "Nolidae",
  "proposed_genus": "Calymera",
  "proposed_subGenus": "null",
  "proposed_species": "Calymera albimargo",
  "proposed_scientificName": "Carea vulpina Warren, 1912",
  "proposed_acceptedScientificName": "Calymera albimargo (Warren, 1912)",
  "proposed_kingdomKey": 1,
  "proposed_phylumKey": 54,
  "proposed_classKey": 216,
  "proposed_orderKey": 797,
  "proposed_familyKey": 9717,
  "proposed_genusKey": 4688701,
  "proposed_subGenusKey": "null",
  "proposed_speciesKey": 11523845,
  "proposed_taxonKey": 1800751,
  "proposed_acceptedTaxonKey14932": 11523845,
  "_key": 4904,
  "changes": {
    "kingdom": true,
    "kingdomKey": true,
    "phylum": true,
    "phylumKey": true,
    "class": true,
    "classKey": true,
    "order": true,
    "orderKey": true,
    "family": true,
    "familyKey": true,
    "genus": true,
    "genusKey": true,
    "species": true,
    "speciesKey": true,
    "scientificName": true,
    "acceptedScientificName": true,
    "taxonKey": true
  },
  "reviewed": false
}
mdoering commented 1 year ago

Agree this is unfortunate: http://backbonebuild-vh.gbif.org:9000/species/match2?verbose=true&name=Carex%20vulpina%20agg.

Sth odd with the aggregate matching, which is supposed to ignore the species and try the next higher level - Carex in this case. Which should yield a plant genus match: http://backbonebuild-vh.gbif.org:9000/species/match2?verbose=true&name=Carex

Cant say whats going wrong here

ahahn-gbif commented 1 year ago

As would http://backbonebuild-vh.gbif.org:9000/species/match2?verbose=true&name=Carex%20vulpina - is "agg." recognized as a term to omit when matching (or would that even make sense)?

mdoering commented 1 year ago

We explicitly did not want to match to the species in case of aggregates and had users complaining about that: https://github.com/gbif/portal-feedback/issues/4459

mdoering commented 1 year ago

This happens because we remove the exact matches due to the aggregate, but leave in the fuzzy ones. I will change the code so we also ignore any fuzzy matches in case we removed exact matches due to the aggregate rank

ahahn-gbif commented 1 year ago

Now matching at genus level with full higher taxonomy of the correct group (Cyperaceae), http://backbonebuild-vh.gbif.org:9000/species/match2?verbose=true&name=Carex%20vulpina%20agg.

mdoering commented 1 year ago

matches now correctly to higher genus: http://backbonebuild-vh.gbif.org:9000/species/match2?verbose=true&name=Carex%20vulpina%20agg.