elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.52k stars 24.61k forks source link

Reduce default geoip logging at startup #81356

Open martijnvg opened 2 years ago

martijnvg commented 2 years ago

Currently when the geoip processor infrastructure is started up a lot of logging is printed out to the console by default:

[2021-12-06T09:08:22,402][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] updating geoip databases
[2021-12-06T09:08:22,403][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] fetching geoip databases overview from [https://geoip.elastic.co/v1/database?elastic_geoip_service_tos=agree]
[2021-12-06T09:08:23,089][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] updating geoip database [GeoLite2-ASN.mmdb]
[2021-12-06T09:08:25,044][INFO ][o.e.i.g.DatabaseNodeService] [runTask-0] downloading geoip database [GeoLite2-ASN.mmdb] to [/Users/mvg/dev/code/elasticsearch/master/build/testclusters/runTask-0/tmp/geoip-databases/vqtZxDPLTX-q6U9tDue1Vw/GeoLite2-ASN.mmdb.tmp.gz]
[2021-12-06T09:08:25,050][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] updated geoip database [GeoLite2-ASN.mmdb]
[2021-12-06T09:08:25,074][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] updating geoip database [GeoLite2-City.mmdb]
[2021-12-06T09:08:25,238][INFO ][o.e.i.g.DatabaseNodeService] [runTask-0] successfully reloaded changed geoip database file [/Users/mvg/dev/code/elasticsearch/master/build/testclusters/runTask-0/tmp/geoip-databases/vqtZxDPLTX-q6U9tDue1Vw/GeoLite2-ASN.mmdb]
[2021-12-06T09:08:28,805][INFO ][o.e.i.g.DatabaseNodeService] [runTask-0] downloading geoip database [GeoLite2-City.mmdb] to [/Users/mvg/dev/code/elasticsearch/master/build/testclusters/runTask-0/tmp/geoip-databases/vqtZxDPLTX-q6U9tDue1Vw/GeoLite2-City.mmdb.tmp.gz]
[2021-12-06T09:08:28,807][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] updated geoip database [GeoLite2-City.mmdb]
[2021-12-06T09:08:28,808][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] updating geoip database [GeoLite2-Country.mmdb]
[2021-12-06T09:08:30,307][INFO ][o.e.i.g.DatabaseNodeService] [runTask-0] downloading geoip database [GeoLite2-Country.mmdb] to [/Users/mvg/dev/code/elasticsearch/master/build/testclusters/runTask-0/tmp/geoip-databases/vqtZxDPLTX-q6U9tDue1Vw/GeoLite2-Country.mmdb.tmp.gz]
[2021-12-06T09:08:30,309][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] updated geoip database [GeoLite2-Country.mmdb]
[2021-12-06T09:08:30,391][INFO ][o.e.i.g.DatabaseNodeService] [runTask-0] successfully reloaded changed geoip database file [/Users/mvg/dev/code/elasticsearch/master/build/testclusters/runTask-0/tmp/geoip-databases/vqtZxDPLTX-q6U9tDue1Vw/GeoLite2-Country.mmdb]
[2021-12-06T09:08:30,764][INFO ][o.e.i.g.DatabaseNodeService] [runTask-0] successfully reloaded changed geoip database file [/Users/mvg/dev/code/elasticsearch/master/build/testclusters/runTask-0/tmp/geoip-databases/vqtZxDPLTX-q6U9tDue1Vw/GeoLite2-City.mmdb]

This is too verbose and makes it more difficult to spot other important messages. Most of this logging should be turned into a debug logging and ideally geoip Infrastructure should print out 1 or 2 info messages.

Something like this:

[2021-12-06T09:08:30,309][INFO ][o.e.i.g.GeoIpDownloader  ] [runTask-0] successfully downloaded geoip database files [GeoLite2-Country.mmdb,GeoLite2-City.mmdb,GeoLite2-ASN.mmdb]
[2021-12-06T09:08:30,391][INFO ][o.e.i.g.DatabaseNodeService] [runTask-0] successfully loaded geoip database files [GeoLite2-Country.mmdb, GeoLite2-City.mmdb, GeoLite2-ASN.mmdb]

The first message is whether the geoip downloader successfully downloaded the database files from the external download service into the special geoip database system index. This will only be printed on the node that performed the geoip download task.

The second message is whether an ingest node downloaded the database files from the geoip system index into a temp directory successfully, so that geoip processor can load it for geoip enrichment. This will be printed on each ingest node.

elasticmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)

bytebilly commented 2 years ago

Thanks @martijnvg! This would be very helpful for 8.0 where the Security ON by default flow will print passwords and other relevant information in the startup message. Reducing the footprint of GeoIP could greatly improve discoverability.

Is that something that you would see doable in that timeframe?

cc @elastic/es-security

martijnvg commented 2 years ago

I would need to discuss this with the @elastic/es-data-management team. I think as a first step we can reduce most info logging to debug statements, but grouping multiple successful reloads / downloads message into a single log message, so that at most to info logs are printed, may require some more work.

But I think that the first step/part, can be be done for 8.0. We should result in at most 6 info log lines, if node is both ingest node and the geoip download persistent task runs on that node as well.

jakelandis commented 2 years ago

related: #81397 and #81398