elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.65k stars 24.65k forks source link

GeoIP processor support for ISP database #71718

Closed wasserman closed 4 months ago

wasserman commented 3 years ago

The GeoIP processor support database_file for an alternative database from maxmind. It would be nice to be able to use the ISP database from https://www.maxmind.com/en/geoip2-isp-database.

I prepared a bundle per https://www.elastic.co/guide/en/cloud/current/ec-custom-bundles.html#ec-prepare-custom-bundles. Used a sample from https://github.com/maxmind/MaxMind-DB/blob/main/test-data/GeoIP2-ISP-Test.mmdb. JSON representation of the file for reference is at https://github.com/maxmind/MaxMind-DB/blob/main/source-data/GeoIP2-ISP-Test.json

When I tried to use this database_file the error was:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parse_exception",
        "reason" : "[database_file] Unsupported database type [GeoIP2-ISP]",
        "property_name" : "database_file",
        "processor_type" : "geoip"
      }
    ],
    "type" : "parse_exception",
    "reason" : "[database_file] Unsupported database type [GeoIP2-ISP]",
    "property_name" : "database_file",
    "processor_type" : "geoip"
  },
  "status" : 400
}

The section of code that shows this limitation is here: https://github.com/elastic/elasticsearch/blob/425ed4cbc1f3f2bd2ca82091bc357f263687b149/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java

    private Map<String, Object> getGeoData(String ip) throws IOException {
        String databaseType = lazyLoader.getDatabaseType();
        final InetAddress ipAddress = InetAddresses.forString(ip);
        Map<String, Object> geoData;
        if (databaseType.endsWith(CITY_DB_SUFFIX)) {
            try {
                geoData = retrieveCityGeoData(ipAddress);
            } catch (AddressNotFoundRuntimeException e) {
                geoData = Collections.emptyMap();
            }
        } else if (databaseType.endsWith(COUNTRY_DB_SUFFIX)) {
            try {
                geoData = retrieveCountryGeoData(ipAddress);
            } catch (AddressNotFoundRuntimeException e) {
                geoData = Collections.emptyMap();
            }
        } else if (databaseType.endsWith(ASN_DB_SUFFIX)) {
            try {
                geoData = retrieveAsnGeoData(ipAddress);
            } catch (AddressNotFoundRuntimeException e) {
                geoData = Collections.emptyMap();
            }
        } else {
            throw new ElasticsearchParseException("Unsupported database type [" + lazyLoader.getDatabaseType()
                + "]", new IllegalStateException());
        }
        return geoData;

I hope it is as easy as implementing retrieveISPGeoData and then whitelisting the ISP database filename.

Thanks!

elasticmachine commented 3 years ago

Pinging @elastic/es-core-features (Team:Core/Features)

dcode commented 3 years ago

I'd like to add, it'd be better to support, at minimum, the official MaxMind GeoIP2 database types:

The ingest processor loading code I think can get a bit simpler by leveraging the DatabaseReader.getDatabaseType() method here, which returns an int as an OR'd enum. This way the fields available are dictated by the embedded metadata and not an arbitrary filename.

Supporting the Enterprise database and the ISP database essentially provides a superset of all standard database fields. It's not clear to me how the Java bindings allow for accessing custom attributes, but that'd be a "nice to have" as well.

Enhancing this ingest processor this way could add immense value to corporate users that would like to enrich data with internal IP geolocation information and possibly subnet names. For my use-case, I am attempting to use the ingest-geoip processor to enrich known bad malware C2 endpoints. Since I'm limited to a city OR an ASN database, I have to use two distinct databases. Using the approach suggested above with the getDatabaseType(), I think it should be possible to load the City (or Enterprise) fields, and then also load the ASN fields by simply looping over all supported interfaces of the declared database type.

jakelandis commented 2 years ago

This would be a great enhancement. We will need to reach out to MaxMind to see if they offer sample/test databases we could use for testing.

related: https://github.com/elastic/elasticsearch/issues/80748

athanatos64 commented 1 year ago

+1 to support more commercial MaxMind databases in geoip processor

truong-hua commented 1 year ago

Please support this which will help to trace the ISP of origin of requests from nginx

tylerperk commented 4 months ago

Hi @dcode @athanatos64 @truong-hua We are working on adding support for the GeoIP2 Enterprise Database and GeoIP2-Anonymous IP Database to Elasticsearch ingest pipelines.

These files contain different/additional fields than the free GeoLite2 files we currently support. The properties parameter in a geoip processor can be used to specify which fields to return, in case you want more/fewer/different subset than the default. We're trying to decide which fields to return to the target_field by default. For the Anonymous IP file it's a relatively short list so we plan to return most of them by default. The Enterprise file has quite a few fields so we're seeking community feedback for that one.

Can you please respond back with which fields you would typically want by default? The list of available fields are:

GeoIP2 Enterprise Database: "city.name", "continent.name", "country.isoCode", "country.name", "location.latitude", "location.longitude", "location.timeZone", "mostSpecificSubdivision.isoCode", "mostSpecificSubdivision.name", "traits.anonymous", "traits.anonymousVpn", "traits.autonomousSystemNumber", "traits.autonomousSystemOrganization", "traits.hostingProvider", "traits.network", "traits.publicProxy", "traits.residentialProxy", "traits.torExitNode", "city.confidence", "city.geoNameId", "city.names", "continent.code", "continent.geoNameId", "continent.names", "country.confidence", "country.geoNameId", "country.inEuropeanUnion", "country.names", "leastSpecificSubdivision.confidence", "leastSpecificSubdivision.geoNameId", "leastSpecificSubdivision.isoCode", "leastSpecificSubdivision.name", "leastSpecificSubdivision.names", "location.accuracyRadius", "location.averageIncome", "location.metroCode", "location.populationDensity", "maxMind", "mostSpecificSubdivision.confidence", "mostSpecificSubdivision.geoNameId", "mostSpecificSubdivision.names", "postal.code", "postal.confidence", "registeredCountry.confidence", "registeredCountry.geoNameId", "registeredCountry.inEuropeanUnion", "registeredCountry.isoCode", "registeredCountry.name", "registeredCountry.names", "representedCountry.confidence", "representedCountry.geoNameId", "representedCountry.inEuropeanUnion", "representedCountry.isoCode", "representedCountry.name", "representedCountry.names", "representedCountry.type", "subdivisions.confidence", "subdivisions.geoNameId", "subdivisions.isoCode", "subdivisions.name", "subdivisions.names", "traits.anonymousProxy", "traits.anycast", "traits.connectionType", "traits.domain", "traits.ipAddress", "traits.isp", "traits.legitimateProxy", "traits.mobileCountryCode", "traits.mobileNetworkCode", "traits.organization", "traits.satelliteProvider", "traits.staticIpScore", "traits.userCount", "traits.userType"

cc @joegallo

joegallo commented 4 months ago

Closed by https://github.com/elastic/elasticsearch/pull/108651