gbif / checklistbank

GBIF Checklist Bank
Apache License 2.0
31 stars 14 forks source link

Regression for Saltator Vieillot, 1816 #286

Open ahahn-gbif opened 1 year ago

ahahn-gbif commented 1 year ago

A similar case to #280, mostly logged for double-checking the fix proposed there, after it's been applied. Bird with higher taxonomy mapped to beetle group based on mis-mapped specific epithet / scientific name

{
  "count": 275,
  "verbatim_kingdom": "Animalia",
  "verbatim_phylum": "Chordata",
  "verbatim_class": "Aves",
  "verbatim_order": "Passeriformes",
  "verbatim_family": "Thraupidae",
  "verbatim_genus": "Saltator",
  "verbatim_species": "maximus",
  "verbatim_infra": "null",
  "verbatim_rank": "null",
  "verbatim_verbatimRank": "null",
  "verbatim_scientificName": "maximus",
  "verbatim_generic": "null",
  "verbatim_author": "null",
  "current_kingdom": "Animalia",
  "current_phylum": "Chordata",
  "current_class": "Aves",
  "current_order": "Passeriformes",
  "current_family": "Cardinalidae",
  "current_genus": "Saltator",
  "current_subGenus": "null",
  "current_species": "null",
  "current_scientificName": "Saltator Vieillot, 1816",
  "current_acceptedScientificName": "Saltator Vieillot, 1816",
  "current_kingdomKey": 1,
  "current_phylumKey": 44,
  "current_classKey": 212,
  "current_orderKey": 729,
  "current_familyKey": 9285,
  "current_genusKey": 5428759,
  "current_subGenusKey": "null",
  "current_speciesKey": "null",
  "current_taxonKey": 5428759,
  "current_acceptedTaxonKey": 5428759,
  "proposed_kingdom": "Animalia",
  "proposed_phylum": "Arthropoda",
  "proposed_class": "Insecta",
  "proposed_order": "Coleoptera",
  "proposed_family": "Curculionidae",
  "proposed_genus": "Maximus",
  "proposed_subGenus": "null",
  "proposed_species": "null",
  "proposed_scientificName": "Maximus Alonso-Zarazaga & Lyal, 2009",
  "proposed_acceptedScientificName": "Maximus Alonso-Zarazaga & Lyal, 2009",
  "proposed_kingdomKey": 1,
  "proposed_phylumKey": 54,
  "proposed_classKey": 216,
  "proposed_orderKey": 1470,
  "proposed_familyKey": 4239,
  "proposed_genusKey": 8722336,
  "proposed_subGenusKey": "null",
  "proposed_speciesKey": "null",
  "proposed_taxonKey": 8722336,
  "proposed_acceptedTaxonKey229": 8722336,
  "_key": 4602,
  "changes": {
    "phylum": true,
    "phylumKey": true,
    "class": true,
    "classKey": true,
    "order": true,
    "orderKey": true,
    "family": true,
    "familyKey": true,
    "genus": true,
    "genusKey": true,
    "scientificName": true,
    "acceptedScientificName": true,
    "taxonKey": true
  },
  "reviewed": false
}
mdoering commented 1 year ago

Now matches to the species: http://backbonebuild-vh.gbif.org:9000/species/match2?verbose=true&family=Thraupidae&genus=Saltator&species=maximus&name=maximus

ahahn-gbif commented 1 year ago

I still find this in the review sheet though (search for "Maximus" with capital M under scientificName): image

ahahn-gbif commented 1 year ago

reopening, as it seems to persist (see above)

mdoering commented 1 year ago

this is weird. I did see the wrong match when I first hit this URL, but after reloading it returned the right species result: http://backbonebuild-vh.gbif.org:9000/species/match2?verbose=true&kingdom=Animalia&phylum=Chordata&class=Aves&order=Passeriformes&family=Thraupidae&genus=Saltator&species=maximus&name=maximus

@ahahn-gbif does the matching link look good to you?

ahahn-gbif commented 1 year ago

Your link looks good (correctly assigned to genus and recognized as a bird species). In the taxonomy release review map, searching under scientific name for "maximus" gives all good results, while "Maximus" still finds the mismatched Insect assignment of the 275 records

mdoering commented 1 year ago

Yes, I have no explanation cause it should have called the above link - which also returned the wrong result to me when I first opened it. @timrobertson100 some odd caching somewhere, cant explain this.

timrobertson100 commented 1 year ago

...genus=Saltator&name=maximus

vs

...genus=Saltator&species=maximus&name=maximus

perhaps?

timrobertson100 commented 1 year ago

Pfff.. the clients are doing more interpretation. We've removed rank guessing @mdoering but see here:

Note that it is also doing funky stuff with authorship. Is that authorship stuff also handled behind the service now? Should we just rip out all interpretations beyond trim() and pass it over?

mdoering commented 1 year ago

Yes I would think so. Authorship is handled and all kinds of variations in supplying dwc style data.

Only cleaning of data strings is currently limited to trimming and changes in case, but no removal of quotes, verbatim NULL strings or UTF garbage and other oddities. I would not mind if that is still done on the pipelines side.

mdoering commented 1 year ago

well, I reckon we can move that also if wanted

mdoering commented 1 year ago

I spot dwc:genericName in the clients code. That is not part of the matching parameters and I was wondering if we want to add that too? It is not exactly the same as dwc:genus