elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.74k stars 24.68k forks source link

Manual GeoIP database updates require undocumented restart #99679

Open jpittiglio opened 1 year ago

jpittiglio commented 1 year ago

Elasticsearch Version

8.10.1

Installed Plugins

No response

Java Version

bundled

OS Version

Official Docker images

Problem Description

In an air-gapped environment, Elasticsearch provides multiple mechanisms for GeoIP database updates, one of which is 'manual' updates as described here: https://www.elastic.co/guide/en/elasticsearch/reference/8.10/geoip-processor.html#manually-update-geoip-databases

When performing the steps above, Elasticsearch recognizes the database has been loaded both according to the logs and the GeoIP stats API endpoint, but active data ingest does not properly enrich the data - it continues to tag new data with "_geoip_database_unavailable_GeoLite2-City.mmdb"

After restarting Elasticsearch, this works as expected. However, the alternative mechanism (using a custom endpoint - https://www.elastic.co/guide/en/elasticsearch/reference/8.10/geoip-processor.html#use-custom-geoip-endpoint) properly loads and recognizes the database, and begins enriching new data immediately with no restart required.

I believe this is either a bug or a documentation update is required to highlight expected behavior / additional steps required.

Steps to Reproduce

This was initially discovered in a multi-node 8.8.2 cluster running on bare metal, and confirmed in a lab environment running 8.10.1 in a single node Docker container. For simplicity, the docker-elk repository was used: https://github.com/deviantony/docker-elk

A handful of changes were made to the default compose file to ensure proper mounting of config directories (so the GeoLite2 database could be loaded later), some port mapping changes to eliminate conflict with another local cluster running, and the elasticsearch.yml had an additional setting added:

ingest.geoip.downloader.endpoint: http://kibana:8000/overview.json

Which was later used to test the alternate 'custom endpoint' mechanism. Otherwise, this is effectively a generic / basic installation of Elasticsearch and Kibana.


A very basic Python script using the official Elasticsearch Python library was written to repeatedly attempt ingest of data to simulate the production environment as much as possible - this step may or may not be required, but helps validate results:

import elasticsearch
import time
from elasticsearch.helpers import bulk
from datetime import datetime

if __name__ == "__main__":
    es = elasticsearch.Elasticsearch(hosts=["http://localhost:9229"], basic_auth=('elastic', 'changeme'))
    while True:
        try:
            now = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S.%fZ")
            record = {"_index": "tester", "_source": {"@timestamp": now, "source": {"ip": "8.8.8.8"}}}
            r = bulk(es, [record], request_timeout=1)
            print(r)
        except KeyboardInterrupt:
            print("Stopping")
            break
        except Exception as e:
            print(f"Failed - {e}")
            es = elasticsearch.Elasticsearch(hosts=["http://localhost:9229"], basic_auth=('elastic', 'changeme'))
        finally:
            time.sleep(0.5)

Note this is a throwaway cluster with default settings from docker-elk, and thus the hardcoded credentials.


  1. After starting the cluster, disable the GeoIP downloader
    PUT _cluster/settings
    {"persistent":{"ingest.geoip.downloader.enabled":false}}

Validate change in logs

{"@timestamp":"2023-09-19T15:15:55.602Z", "log.level": "INFO", "message":"updating [ingest.geoip.downloader.enabled] from [true] to [false]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][clusterApplierService#updateTask][T#1]","log.logger":"org.elasticsearch.common.settings.ClusterSettings","trace.id":"ac0779d4735a34bd519cdd9a6f80cfa4","elasticsearch.cluster.uuid":"pduNeznmRs2q2LpMFm_NwQ","elasticsearch.node.id":"vAtPHZH6RPSpgVSoHBT_Jg","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
  1. Add a tester ingest pipeline and index template - used Dev Tools for simplicity
    
    PUT _ingest/pipeline/tester
    {
    "processors": [
    {
      "geoip": {
        "field": "source.ip",
        "target_field": "source.geo"
      }
    }
    ]
    }

PUT _template/tester { "index_patterns": [ "tester" ], "settings": { "index.default_pipeline": "tester" }, "mappings": { "properties": { "@timestamp": { "type": "date" }, "source": { "properties": { "ip": { "type": "keyword" }, "geo": { "properties": { "city_name": { "ignore_above": 1024, "type": "keyword" }, "continent_code": { "ignore_above": 1024, "type": "keyword" }, "continent_name": { "ignore_above": 1024, "type": "keyword" }, "country_iso_code": { "ignore_above": 1024, "type": "keyword" }, "country_name": { "ignore_above": 1024, "type": "keyword" }, "location": { "type": "geo_point" }, "name": { "ignore_above": 1024, "type": "keyword" }, "postal_code": { "ignore_above": 1024, "type": "keyword" }, "region_iso_code": { "ignore_above": 1024, "type": "keyword" }, "region_name": { "ignore_above": 1024, "type": "keyword" }, "timezone": { "ignore_above": 1024, "type": "keyword" } } } } }, "tags": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } }


3. Start indexing data using noted Python script - could likely do simple POST request here instead
```sh
python ingest.py
  1. Validate results are tagged (as expected) with the database unavailable message

    GET tester/_search?sort=@timestamp:desc
    {
    "_index": "tester",
    "_id": "RRAFrooB8OOUXd3cQCs6",
    "_score": null,
    "_source": {
      "@timestamp": "2023-09-19T15:18:14.328535Z",
      "source": {
        "ip": "8.8.8.8"
      },
      "tags": [
        "_geoip_database_unavailable_GeoLite2-City.mmdb"
      ]
    },
    "sort": [
      1695136694328
    ]
    }
  2. Follow relevant steps to add local GeoIP database to a local ingest-geoip directory - ref: https://www.elastic.co/guide/en/elasticsearch/reference/8.10/geoip-processor.html#manually-update-geoip-databases

    # Steps referenced are after exec into running Elasticsearch container
    mkdir config/ingest-geoip
    cd config/ingest-geoip
    curl -o GeoLite2-City.tgz "<appropriate URL>"
    # Used link from: https://geoip.elastic.co/v1/database?elastic_geoip_service_tos=agree
    tar -xf GeoLite2-City.tgz
  3. Confirm logs indicate database file loaded correctly

    {"@timestamp":"2023-09-19T15:23:10.149Z", "log.level": "INFO", "message":"database file changed [/usr/share/elasticsearch/config/ingest-geoip/GeoLite2-City.mmdb], reload database...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][scheduler][T#1]","log.logger":"org.elasticsearch.ingest.geoip.ConfigDatabases","elasticsearch.cluster.uuid":"pduNeznmRs2q2LpMFm_NwQ","elasticsearch.node.id":"vAtPHZH6RPSpgVSoHBT_Jg","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
    {"@timestamp":"2023-09-19T15:23:15.155Z", "log.level": "INFO", "message":"database file changed [/usr/share/elasticsearch/config/ingest-geoip/GeoLite2-City.mmdb], reload database...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][scheduler][T#1]","log.logger":"org.elasticsearch.ingest.geoip.ConfigDatabases","elasticsearch.cluster.uuid":"pduNeznmRs2q2LpMFm_NwQ","elasticsearch.node.id":"vAtPHZH6RPSpgVSoHBT_Jg","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
    {"@timestamp":"2023-09-19T15:23:15.155Z", "log.level": "INFO", "message":"evicted [0] entries from cache after reloading database [/usr/share/elasticsearch/config/ingest-geoip/GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][scheduler][T#1]","log.logger":"org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader","elasticsearch.cluster.uuid":"pduNeznmRs2q2LpMFm_NwQ","elasticsearch.node.id":"vAtPHZH6RPSpgVSoHBT_Jg","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
  4. Confirm API shows database is recognized

    GET _ingest/geoip/stats
    {
    "stats": {
    "successful_downloads": 0,
    "failed_downloads": 0,
    "total_download_time": 0,
    "databases_count": 0,
    "skipped_updates": 0,
    "expired_databases": 0
    },
    "nodes": {
    "vAtPHZH6RPSpgVSoHBT_Jg": {
      "config_databases": [
        "GeoLite2-City.mmdb"
      ]
    }
    }
    }
  5. Search of actively ingested documents still shows GeoIP processor failing due to unavailable database (GET tester/_search?sort=@timestamp:desc). This is unexpected behavior.

  6. Add a new, identical pipeline

    PUT _ingest/pipeline/tester2
    {
    "processors": [
    {
      "geoip": {
        "field": "source.ip",
        "target_field": "source.geo"
      }
    }
    ]
    }
  7. Run a _simulate and validate data is enriched properly using the GeoIP processor

    POST _ingest/pipeline/tester2/_simulate
    {
    "docs": [
    {
      "_source": {
        "source": {
          "ip": "8.8.8.8"
        }
      }
    }
    ]
    }
    {
    "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "source": {
            "geo": {
              "continent_name": "North America",
              "country_iso_code": "US",
              "country_name": "United States",
              "location": {
                "lon": -97.822,
                "lat": 37.751
              }
            },
            "ip": "8.8.8.8"
          }
        },
        "_ingest": {
          "timestamp": "2023-09-19T15:29:00.804002648Z"
        }
      }
    }
    ]
    }

Live ingest will similarly work in a new index with this new pipeline. Everything works as expected after restarting the cluster.


As noted, in a fresh cluster with the custom endpoint configured (ref: https://www.elastic.co/guide/en/elasticsearch/reference/8.10/geoip-processor.html#use-custom-geoip-endpoint), no restart of Elasticsearch is required. Rather than document all these steps (though happy to if desired), the only notable difference observed was the logs have slightly different messages output versus the local file loaded as described above:

{"@timestamp":"2023-09-19T15:45:04.836Z", "log.level": "INFO", "message":"successfully downloaded geoip database [GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][generic][T#2]","log.logger":"org.elasticsearch.ingest.geoip.GeoIpDownloader","trace.id":"2341607fa16121b111b1ca4bd525b89f","elasticsearch.cluster.uuid":"OGD0pXrWSyayjJC1uGYoHA","elasticsearch.node.id":"vXzubLv9StuCBdb5BSPJpw","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-09-19T15:45:05.380Z", "log.level": "INFO", "message":"successfully loaded geoip database file [GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][generic][T#8]","log.logger":"org.elasticsearch.ingest.geoip.DatabaseNodeService","trace.id":"2341607fa16121b111b1ca4bd525b89f","elasticsearch.cluster.uuid":"OGD0pXrWSyayjJC1uGYoHA","elasticsearch.node.id":"vXzubLv9StuCBdb5BSPJpw","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}

Logs (if relevant)

Relevant logs were captured in steps above, however for simple comparison:

Logs using the 'local file' mechanism:

{"@timestamp":"2023-09-19T15:23:10.149Z", "log.level": "INFO", "message":"database file changed [/usr/share/elasticsearch/config/ingest-geoip/GeoLite2-City.mmdb], reload database...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][scheduler][T#1]","log.logger":"org.elasticsearch.ingest.geoip.ConfigDatabases","elasticsearch.cluster.uuid":"pduNeznmRs2q2LpMFm_NwQ","elasticsearch.node.id":"vAtPHZH6RPSpgVSoHBT_Jg","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-09-19T15:23:15.155Z", "log.level": "INFO", "message":"database file changed [/usr/share/elasticsearch/config/ingest-geoip/GeoLite2-City.mmdb], reload database...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][scheduler][T#1]","log.logger":"org.elasticsearch.ingest.geoip.ConfigDatabases","elasticsearch.cluster.uuid":"pduNeznmRs2q2LpMFm_NwQ","elasticsearch.node.id":"vAtPHZH6RPSpgVSoHBT_Jg","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-09-19T15:23:15.155Z", "log.level": "INFO", "message":"evicted [0] entries from cache after reloading database [/usr/share/elasticsearch/config/ingest-geoip/GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][scheduler][T#1]","log.logger":"org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader","elasticsearch.cluster.uuid":"pduNeznmRs2q2LpMFm_NwQ","elasticsearch.node.id":"vAtPHZH6RPSpgVSoHBT_Jg","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}

Logs using the 'custom endpoint' web server mechanism:

{"@timestamp":"2023-09-19T15:45:04.836Z", "log.level": "INFO", "message":"successfully downloaded geoip database [GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][generic][T#2]","log.logger":"org.elasticsearch.ingest.geoip.GeoIpDownloader","trace.id":"2341607fa16121b111b1ca4bd525b89f","elasticsearch.cluster.uuid":"OGD0pXrWSyayjJC1uGYoHA","elasticsearch.node.id":"vXzubLv9StuCBdb5BSPJpw","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-09-19T15:45:05.380Z", "log.level": "INFO", "message":"successfully loaded geoip database file [GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][generic][T#8]","log.logger":"org.elasticsearch.ingest.geoip.DatabaseNodeService","trace.id":"2341607fa16121b111b1ca4bd525b89f","elasticsearch.cluster.uuid":"OGD0pXrWSyayjJC1uGYoHA","elasticsearch.node.id":"vXzubLv9StuCBdb5BSPJpw","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
elasticsearchmachine commented 1 year ago

Pinging @elastic/es-data-management (Team:Data Management)

joegallo commented 6 months ago

While a restart is a workaround, a much lighter-weight workaround is making any change to the geoip processor in the pipeline -- for example, I added a "description": "foo" to the processor and re-PUT it, and then immediately removed the "description".

Regardless, though, I agree that this is a bug.

edit: a note to myself for myself: ConfigDatabases#updateDatabase doesn't reload pipelines for which the database was not available like DatabaseNodeService#updateDatabase which does. It should learn to do that.