elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
194 stars 421 forks source link

Verify mapping problems after migrating to ecs@mappings #10848

Open zmoog opened 3 weeks ago

zmoog commented 3 weeks ago

A few users reported mapping errors on a few integrations. We suspect these problems may be related to integrations that migrated to ecs@mappings with recent updates.

Here is the list of fields with mapping issues:

Field Previous mapping Current mapping Data Stream Status Root Cause Notes Issues / PRs
client.geo.location geo_point object logs-azure.graphactivitylogs-* Reproduced integration-mapping-problem I believe this is an integration-mapping-problem https://github.com/elastic/integrations/pull/11102
source.geo.location geo_point object logs-azure.graphactivitylogs-* Reproduced integration-mapping-problem I believe this is an integration-mapping-problem https://github.com/elastic/integrations/pull/11102
destination.port long keyword logs-cisco_aironet.log-* Needs sample docs ecs@mappings+type-coercion Probably an ecs@mappings+type-coercion issue, but I don't have a sample document to double-check; probable, but still a hypothesis. https://github.com/elastic/integrations/pull/11103
event.duration long keyword logs-azure.activitylogs.log-* Needs sample docs ecs@mappings+type-coercion Probably an ecs@mappings+type-coercion issue, but I don't have a sample document to double-check; probable, but still a hypothesis. https://github.com/elastic/integrations/pull/11104
dns.authorities
dns.id long keyword logs-logstash.tpot-*
error.code keyword long logs-system.security-* Unclear integration-update PR changed the value type to string with the PR https://github.com/elastic/integrations/pull/10529/files ; expected field mapping in ECS is keyword https://www.elastic.co/guide/en/ecs/current/ecs-error.html#field-error-code
event.severity long keyword logs-cisco_aironet.log-* Reproduced ecs@mappings+type-coercion ecs@mappings+type-coercion, I found the following sample doc in the integration test files: {"event.severity": "4"}; ECS expected type is long https://www.elastic.co/guide/en/ecs/current/ecs-event.html#field-event-severity https://github.com/elastic/integrations/pull/11105
http.request.body object flattened logs-apm.error-*
http.request.headers flattened object logs-apm.error-*
http.response.headers flattened object logs-apm.error-*
input object keyword logs-logstash.tpot-*
log.offset long keyword logs-microsoft_exchange_server.httpproxy-* Reproduced integration-update PR https://github.com/elastic/integrations/pull/9560/files added an explicit mapping to keyword
observer.ip ip keyword logs-ti_abusech_latest.dest_malware-*
request text object logs-logstash.tpot-*
response text object logs-logstash.tpot-*
session
sip.uri
status keyword long logs-logstash.tpot-*
threat.indicator.first_seen date keyword logs-ti_abusech.malware-* Reproduced ecs@mappings+date_detection:false https://github.com/elastic/elasticsearch/pull/112444
threat.indicator.last_seen date keyword logs-ti_abusech.malwarebazaar-* Reproduced ecs@mappings+date_detection:false https://github.com/elastic/elasticsearch/pull/112444
timestamp
user_agent object keyword logs-cisco_asa.log-* Unclear object is the expected mapping for user_agent; see https://www.elastic.co/guide/en/ecs/current/ecs-user_agent.html

Root Causes

Cause Summary Solution
ecs@mappings+type-coercion Mapping changed because ecs@mappings does not perform type coercion Set the right value type in the input/pipeline, or restore explicit mapping in fields/ecs.yml file
ecs@mappings+date_detection:false Setting date_detection: false cause a few fields to not be mapped as date Set date_detection: true or update ecs@mappings
integration-mapping-problem Incorrect mapping in the integration Review the change and fix the mapping, in necessary.
integration-update Explicit change in integration Probably deal with this breaking change if the outcome is in line with ECS

ecs@mappings+type-coercion

The ecs@mappings component template does not perform type coercion, so if the value is a string, ES maps it as a keyword.

Here is an example, if I perform the following requests using the Dev Tools:

DELETE _data_stream/logs-whatever-sdh5075
POST logs-whatever-sdh5075/_doc
{
  "@timestamp": "2024-08-20T16:58:01+02:00",
  "destination": {
    "port": "8080"
  }
}
GET logs-whatever-sdh5075/_mapping/field/destination.port

I get the following result:

{
  ".ds-logs-whatever-sdh5075-2024.08.22-000001": {
    "mappings": {
      "destination.port": {
        "full_name": "destination.port",
        "mapping": {
          "port": {
            "type": "keyword",
            "ignore_above": 1024
          }
        }
      }
    }
  }
}

ecs@mappings+date_detection:false

When date_detection is disabled, the following fields aren’t mapped correctly:

threat.indicator.first_seen
threat.indicator.modified_at
threat.enrichments.indicator.modified_at
threat.enrichments.matched.occurred
threat.enrichments.indicator.first_seen 
threat.enrichments.indicator.last_seen
threat.indicator.last_seen 

integration-mapping-problem

We probably need to change mappings in the integration to something similar (like most other integrations do):

- name: client.geo.location
  external: ecs
- name: source.geo.location
  external: ecs

Or remove these mappings and only use ecs@mappings.

integration-update

Mapping changed due to integration updates.

zmoog commented 3 weeks ago

Checking the first field, client.geo.location in the logs-azure.graphactivitylogs-* data stream. This field has an explicit mapping in the integration:

# packages/azure/data_stream/graphactivitylogs/fields/ecs.yml
- name: client.geo.location.lat
  external: ecs
- name: client.geo.location.lon
  external: ecs
- name: source.geo.location.lat
  external: ecs
- name: source.geo.location.lon
  external: ecs

This leads to the following mapping:

"client": {
  "properties": {
    "geo": {
      "properties": {
        "continent_name": {
          "type": "keyword",
          "ignore_above": 1024
        },
        "country_iso_code": {
          "type": "keyword",
          "ignore_above": 1024
        },
        "country_name": {
          "type": "keyword",
          "ignore_above": 1024
        },
        "location": {
          "properties": {
            "lat": {
              "type": "geo_point"
            },
            "lon": {
              "type": "geo_point"
            }
          }
        }
      }
    },
    "ip": {
      "type": "ip"
    }
  }
}

So client.geo.location here is an object.

Paradoxically, if I index the same document using a logs-*-* data stream, I get the correct mapping from ecs@mappings:

GET logs-whatever-sdh5075/_mapping/field/client.geo.location
{
  ".ds-logs-whatever-sdh5075-2024.08.22-000001": {
    "mappings": {
      "client.geo.location": {
        "full_name": "client.geo.location",
        "mapping": {
          "location": {
            "type": "geo_point"
          }
        }
      }
    }
  }
}

This seems a choice in the logs-azure.graphactivitylogs-* data stream that does not align with ECS and other data streams.

lucabelluccini commented 2 weeks ago

We're getting a ~potential~ issue with host.os.version which is no more defined on the System integration / processor dataset.

While in most cases it is mapped as keyword (it was mapped to keyword in the past), some users seem to get sporadically get mapped to float. We can have "7.9" and "7.9 (Maipo)", but those seem to be correctly coerced into keyword. But it doesn't happen if Beats sends us 7.9. ~I'm gathering a sample and I'll update the comment.~

We have a sample document where we clearly see Beats / Elastic Agent can send "version": 7.2 (where the version is not wrapped in quotes, so it is coerced to float instead of being a keyword).