elastic / ecs

Elastic Common Schema
https://www.elastic.co/what-is/ecs
Apache License 2.0
987 stars 410 forks source link

ECS can no longer map all components out of the box #2326

Open Mpdreamz opened 3 months ago

Mpdreamz commented 3 months ago

As of 8.11 applying all the ECS component templates is no longer possible.

{
  "_meta": {
    "description": "Template installed by ECS.NET 8.11.0 (https://github.com/elastic/ecs-dotnet)",
    "ecs_version": "8.11.0"
  },
  "composed_of": [
    "ecs_8.11.0_base",
    "ecs_8.11.0_agent",
    "ecs_8.11.0_client",
    "ecs_8.11.0_cloud",
    "ecs_8.11.0_container",
    "ecs_8.11.0_data_stream",
    "ecs_8.11.0_destination",
    "ecs_8.11.0_device",
    "ecs_8.11.0_dll",
    "ecs_8.11.0_dns",
    "ecs_8.11.0_ecs",
    "ecs_8.11.0_email",
    "ecs_8.11.0_error",
    "ecs_8.11.0_event",
    "ecs_8.11.0_faas",
    "ecs_8.11.0_file",
    "ecs_8.11.0_group",
    "ecs_8.11.0_host",
    "ecs_8.11.0_http",
    "ecs_8.11.0_log",
    "ecs_8.11.0_network",
    "ecs_8.11.0_observer",
    "ecs_8.11.0_orchestrator",
    "ecs_8.11.0_organization",
    "ecs_8.11.0_package",
    "ecs_8.11.0_process",
    "ecs_8.11.0_registry",
    "ecs_8.11.0_related",
    "ecs_8.11.0_rule",
    "ecs_8.11.0_server",
    "ecs_8.11.0_service",
    "ecs_8.11.0_source",
    "ecs_8.11.0_threat",
    "ecs_8.11.0_tls",
    "ecs_8.11.0_tracing",
    "ecs_8.11.0_url",
    "ecs_8.11.0_user_agent",
    "ecs_8.11.0_user",
    "ecs_8.11.0_vulnerability", "data-streams-mappings", "timeseriesdocs-dotnet-settings"
  ],
  "index_patterns": [
    "timeseriesdocs-dotnet-*"
  ],
  "priority": 527104,
  "data_stream": {},
  "template": {
    "mappings": {
      "date_detection": false,
      "dynamic_templates": [
        {
          "strings_as_keyword": {
            "mapping": {
              "ignore_above": 1024,
              "type": "keyword"
            },
            "match_mapping_type": "string"
          }
        }
      ]
    },
    "settings": {
      "index": {
        "codec": "best_compression",
        "mapping": {
          "total_fields": {
            "limit": 2000
          }
        }
      }
    }
  }
}

Running this against Elasticsearch 8.11 will yield:

"type" : "illegal_argument_exception",
    "reason" : "composable template [timeseriesdocs-dotnet] template after composition with component templates [ecs_8.11.0_base, ecs_8.11.0_agent, ecs_8.11.0_client, ecs_8.11.0_cloud, ecs_8.11.0_container, ecs_8.11.0_data_stream, ecs_8.11.0_destination, ecs_8.11.0_device, ecs_8.11.0_dll, ecs_8.11.0_dns, ecs_8.11.0_ecs, ecs_8.11.0_email, ecs_8.11.0_error, ecs_8.11.0_event, ecs_8.11.0_faas, ecs_8.11.0_file, ecs_8.11.0_group, ecs_8.11.0_host, ecs_8.11.0_http, ecs_8.11.0_log, ecs_8.11.0_network, ecs_8.11.0_observer, ecs_8.11.0_orchestrator, ecs_8.11.0_organization, ecs_8.11.0_package, ecs_8.11.0_process, ecs_8.11.0_registry, ecs_8.11.0_related, ecs_8.11.0_rule, ecs_8.11.0_server, ecs_8.11.0_service, ecs_8.11.0_source, ecs_8.11.0_threat, ecs_8.11.0_tls, ecs_8.11.0_tracing, ecs_8.11.0_url, ecs_8.11.0_user_agent, ecs_8.11.0_user, ecs_8.11.0_vulnerability, data-streams-mappings, timeseriesdocs-dotnet-settings] is invalid",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "invalid composite mappings for [timeseriesdocs-dotnet]",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Limit of total fields [2000] has been exceeded",

We should be able to grow with our users and allow them to use any ECS field ahead of mapping changes.

In newer Elasticsearch version we ship with an ecs@mappings OOTB however this is not exhaustive and can lead to confusion with our users see e.g: https://github.com/elastic/elasticsearch/issues/85146#issuecomment-1993888589

How do ensure the full ECS specification (and/or its merger with OpenTelemetry) is fully natively supported by Elasticsearch?

Should the components move to dynamic_templates for all fields?

ruflin commented 3 months ago

ECS / Semconv will keep growing but I don't think every user should have to include all 2000+ fields by default. ECS is split up into Core and Extended field and by now I think we should split it up further. What we did with the dynamic mappings is partially convert what we have in ECS as specifying each field, moving to ECS being based on patterns. For example *.name is always the same, not matter where it is used. This means someone consuming foo.name can have the same expectation for the field like host.name which I consider a feature. But this is not something that made it yet into ECS.

For the full template you showed above, how is this generated? I guess short term fix for it is it to increase the 2000 limit.

Should we move fully to dynamic templates? My opinion is yes. But as mentioned above, I would like to keep the ecs@mappings simple and rather have an additional ecs-extended@mappings for more fields. My understanding is that status_code is not a long in the scenario mentioned by the user is because it likely is shipped as string. Otherwise I would expect Elasticsearch to directly apply the correct mapping (long).

ruflin commented 3 months ago

@eyalkoren @felixbarny For awareness.

eyalkoren commented 3 months ago

Should the components move to dynamic_templates for all fields?

Are there any shortcomings to dynamic templates that overcome their advantages? If not, then I'd say yes but as @ruflin proposes, if most use cases cannot benefit from these extras, they should come from a different component template.

felixbarny commented 3 months ago

I think the fix for this specific case (when you want to eagerly map all ECS fields) is to increase the field limit. The default is 1000 so I assume that you've already increased the default from 1000 to 2000.

I've added a comment to https://github.com/elastic/elasticsearch/issues/85146#issuecomment-2031285084 to discuss the tradeoffs of mapping all fields vs only mapping fields that differ from the default mapping.

Mpdreamz commented 3 months ago

I can raise the default field size but of course I rather not map all these fields to begin with.

ECS is split up into Core and Extended field and by now I think we should split it up further.

I don't think we should, even the core/extended distinction is already rather ambiguous to users.

Ideally we can message to our users a datastream is ECS/SemConv 'ready' for any field defined in the specification.

Whether we should revisit if 'extended' fields should be part of that specification to begin with I'll leave open for discussion at a later point.

What we did with the dynamic mappings is partially convert what we have in ECS as specifying each field, moving to ECS being based on patterns.

This does in effect create a new segmentation, a datastream is not fully ECS/SemConv ready. Only an undocumented subset is. This is cognitive overhead we push to our users.

For example *.name is always the same, not matter where it is used. This means someone consuming foo.name can have the same expectation for the field like host.name which I consider a feature. But this is not something that made it yet into ECS.

If we can get to a place where the specification encodes these conventions and we can generate the current ecs@mappings with condensed templates for all the defaults. Only further specializations for the exceptions we get a best of both worlds scenario.

I just hope we get to a place where the templates we ship are exhaustive and spec driven

Exhaustive can take many forms:

Whether exhaustive means enforcing field types or not can be a follow up discussion too:

My understanding is that status_code is not a long in the scenario mentioned by the user is because it likely is shipped as string.

I personally feel strongly we should do type checks/coercion and enforce structure since this is explicitly part of the ECS spec. Although how move along forward with this is still open too: https://github.com/elastic/opentelemetry-dev/issues/23.

Ideally we are lenient what we accept, as per logs-db efforts, but strict in how we return data. Again Elasticsearch could take advantage of the ECS spec to know how to return ECS data structurally even with synthetic source enabled.

Mpdreamz commented 3 months ago

Just to be explcit the 2000 field limit comes from the generated index template we advise users to install: https://github.com/elastic/ecs/blob/main/generated/elasticsearch/composable/template.json#L72

ruflin commented 3 months ago

This does in effect create a new segmentation, a datastream is not fully ECS/SemConv ready. Only an undocumented subset is. This is cognitive overhead we push to our users.

@eyalkoren Can you comment on this? I thought we actually have tests in place in Elasticsearch to ensure this is not the case.

felixbarny commented 2 months ago

We do have tests that ensure that every field that is defined in ECS is mapped correctly, assuming the field is sent in the correct JSON type.

Ideally we can message to our users a datastream is ECS/SemConv 'ready' for any field defined in the specification.

Because we have tests that all ECS fields are mapped correctly, I think we can do that.

Whether exhaustive means enforcing field types or not can be a follow up discussion too

The ecs@mappings component template is exhaustive but doesn't enforce/coerce field types. If a mapping for a field is a default mapping (such as string -> keyword), it's not added to ecs@mappings.

I personally feel strongly we should do type checks/coercion and enforce structure since this is explicitly part of the ECS spec.

Fair ask but note that this also comes with tradeoffs, as noted in https://github.com/elastic/elasticsearch/issues/85146#issuecomment-2031285084. As ecs@mappings is used for all logs data streams by default, it would also increase the risk of rejecting documents or ignoring fields that happen to be named like ECS fields but aren't intended to follow ECS.

eyalkoren commented 2 months ago

For reference, said tests are in EcsDynamicTemplatesIT. @Mpdreamz let us know if you think something is missing or not properly enforced.

eyalkoren commented 2 months ago

The ecs@mappings component template is exhaustive but doesn't enforce/coerce field types. If a mapping for a field is a default mapping (such as string -> keyword), it's not added to ecs@mappings.

@felixbarny I think this is where we may have a real issue here to fix. Correct me if I am wrong, but our default string -> keyword dynamic template in effect breaks automatic date and numeric detections of incoming string fields. So if a shipper/integration is used to getting its input status_code properly mapped to long even if it is shipped as a string, when it starts using ecs@mappings it will break that, effectively making this field non-ECS... So maybe we do need to add dynamic templates to cover date and long fields with "match_mapping_type": ["long", "string"]

ruflin commented 2 months ago

So maybe we do need to add dynamic templates to cover date and long fields with "match_mapping_type": ["long", "string"]

Would this work? Will it have an impact on performance? We should definitely test if this works.

felixbarny commented 2 months ago

The ecs@mappings component template isn't coercing to the types as defined by ECS. But it's not obvious to me that we should do that. It can also lead to issues when users don't strictly follow ECS and add custom attributes, such as "status_code": "success". If we enforce the type according to ECS here, this can also lead to issues.

That's one of the reasons why I think that by default, it may be reasonable to expect shippers to send data in the correct data type.

Mpdreamz commented 2 months ago

I want to stress that not enforcing object/nested and field data types means that shippers that send data in the correct format might not be able to read back the data using the same structures that produced them. Due to reads coming from either mixed sources or due to synthetic source breaking the schema.

e.g folks can still send a string to a field that should be an object.

In other words we break the robustness principle: https://en.wikipedia.org/wiki/Robustness_principle by being both lenient in what we accept and what we emit. https://github.com/elastic/opentelemetry-dev/issues/23

This does not mean to give up on e.g _synthetic source but hopefully _synthetic source enabled indices can take advantage of the knowledge they hold ECS data to return that data in accordance with the schema.

, it would also increase the risk of rejecting documents or ignoring fields that happen to be named like ECS fields but aren't intended to follow ECS.

Rejecting documents is a nonstarter agreed. Ignoring fields is one option or rerouting them to labels.* or conflict.* would be another.

There's a hidden superpower lost if we can no longer generate structs in several different languages so we can read each others events (security/observability/stack/etc) with a certain degree of confidence because we adhere to a schema.