gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Evaluate indexing strategies in Elasticsearch for multiple classifications #1046

Open fmendezh opened 8 months ago

fmendezh commented 8 months ago

Evaluate different approaches to index multiple taxonomies: multiples indices vs adding nested elements for multiple classifications.

djtfmartin commented 2 months ago

So far work has only focussed on adding nested elements to the elastic index for multiple classifications.

An important requirement is to be able search by taxonKey for a specified checklistKey. This requires a compound query and needs to be accurate to avoid a match on taxonKey and checklistKey, where the taxonKey is in one subobject (classification), and the checklistKey is in another subobject (classification) in the save index elastic document.

Currently considering 4 potential strategies: