gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Generic name set for higher ranks #945

Closed timrobertson100 closed 1 year ago

timrobertson100 commented 1 year ago

Following the species match step, we are incorrectly setting a genericName on records that have been matched to ranks higher than a genus. An example record has:

"taxonKey": 7881,
"kingdomKey": 1,
"phylumKey": 54,
"classKey": 216,
"orderKey": 809,
"familyKey": 7881,
"acceptedTaxonKey": 7881,
"scientificName": "Gerridae",
"acceptedScientificName": "Gerridae",
"kingdom": "Animalia",
"phylum": "Arthropoda",
"order": "Hemiptera",
"family": "Gerridae",
"genericName": "Gerridae",
"taxonRank": "FAMILY",
"taxonomicStatus": "ACCEPTED",

genericName should be null here

timrobertson100 commented 1 year ago

I suspect this might be happening here, where I think we need rank checks before setting these 3 predicates.

This is called OccurrenceHdfsRecordConverter does that also affect the JSON posted to ES please @fmendezh ?

Edited to add: I think we might also need a rank check here right?

fmendezh commented 1 year ago

@timrobertson100 It looks that those are the two only places where we change the generic name, maybe better if we change the function convertGenericName in the JsonConverter class

timrobertson100 commented 1 year ago

Fixed in dev