Open ManonGros opened 3 years ago
Thanks @ManonGros
Background
In the previous generation of indexing we had a messy codebase where some parsing and assembling of scientificName was done in the "pipeline" before it was passed to the /species/match
service. We took a decision in the refactoring to clearly separate concerns, and the pipeline client simply extracts the verbatim fields and passes them to the service which is the better place to have correct logic to assembly scientificName
.
I believe the correct place to fix this is within the species/match
service. Rather than moving this issue, I'll link a new one so we preserve this history here for the future.
Note before we close this: if the service is deployed with changes, we need to flush the HBase table backing the lookup cache. For that reason, it may be worthwhile deploying this at the same time as the incoming backbone
Right now, it seems that if the scientific name is missing, we either get unexpected taxon interpretations or not at all. Even if the genus + specificEpithet are filled. It would be good to have what we used to have: inferring scientific names from genus, specificEpithet, scientificAuthorship, etc.