gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

failing suggest for long values #1041

Open MortenHofft opened 4 months ago

MortenHofft commented 4 months ago

Is this a bug or is the suggest just out of sync with data? I get the suggestion A agy Tuoidach Creek, Lena River section, opposit from the occurrence suggest API https://api.gbif.org/v1/occurrence/search/datasetName?limit=50&q=A%20%20agy%20Tuoidach%20Creek,%20Lena%20River%20section,%20opposit But no results https://api.gbif.org/v1/occurrence/search?dataset_name=A%20%20agy%20Tuoidach%20Creek,%20Lena%20River%20section,%20opposit (edited)

UPDATE: found it I think https://www.gbif.org/occurrence/1698895443 looks like the issue is that the value is truncated in the suggest endpoint?

MattBlissett commented 3 months ago

The limit of 50 characters is set in the ES index:

      "datasetName": {
        "type": "text",
        "fields": {
          "suggest": {"type": "completion", "analyzer": "lowercase_analyzer", "preserve_separators": true, "preserve_position_increments": true, "max_input_length": 50},
          "keyword": {"type": "keyword", "normalizer": "lowercase_normalizer", "ignore_above": 1024},
          "verbatim": {"type": "keyword", "ignore_above": 1024}
        }
      },