gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Possible missing rank in name lookup #956

Closed timrobertson100 closed 1 year ago

timrobertson100 commented 1 year ago

This record flags that a name based lookup differs from the ID based lookup.

Note that the rank does not appear in the key in the HBase cache:

 2urn:lsid:marinespecies.org:taxname:136025|||Animalia|Annelida||Sipuncula|Golfingiidae|Phascolion|Phascolion||||| column=v:j, timestamp=1695220322831, value={"synonym":false,"usage":{"key":2509428,"name":"Phascolion Th\xC3\xA9el, 1875","rank":"GENUS"},"classification":[{"key":1,"name":"Animalia","rank":"KINGDOM"},{"key":74,"name":"Sipuncula","rank":"PHYLUM"},{"key":159,"name":"Sipunculidea","rank":"CLASS"},{"key":539,"name":"Golfingiiformes","rank":"ORDER"},{"key":7340,"name":"Phascoliidae","rank":"FAMILY"},{"key":2509428,"name":"Phascolion","rank":"GENUS"}],"diagnostics":{"matchType":"EXACT","confidence":100,"status":"ACCEPTED","lineage":[],"alternatives":[],"note":"All provided names were ignored since the usageKey was provided"},"issues":["TAXON_MATCH_NAME_AND_ID_AMBIGUOUS"]}

Using rank changes the API lookup compared to without rank.

Recent changes in KVS and pipelines should have been passing in the rank anyway. Need to verify if it is indeed being passed.

timrobertson100 commented 1 year ago

KVS was incorrectly applying the deny list of names to the rank value. That list included the ranks themselves.