elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
187 stars 392 forks source link

[System] remove duplicated fields #6412

Open tetianakravchenko opened 1 year ago

tetianakravchenko commented 1 year ago

It is possible to define the same field multiple times for the same data_stream, in system package there are lots of duplications, example: https://github.com/elastic/integrations/pull/6118#discussion_r1196332959

see https://github.com/elastic/elastic-package/issues/1287 - there is already available a check in elastic-package for the duplicated fields to avoid it in future, it is only needed to update the format_version

elasticmachine commented 1 year ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

andrewkroh commented 1 year ago

It looks like there are two fields, host.disk.read.bytes and host.disk.write.bytes, that both have a scaled_float and long definition. Which definition should be retained, scaled_float or long?

[
  {
    "name": "host.disk.read.bytes",
    "types": [
      "scaled_float",
      "long"
    ],
    "conflicts": [
      {
        "file": "system/data_stream/diskio/fields/fields.yml",
        "line": 126,
        "type": "scaled_float"
      },
      {
        "file": "system/data_stream/diskio/fields/agent.yml",
        "line": 195,
        "type": "long"
      }
    ]
  },
  {
    "name": "host.disk.write.bytes",
    "types": [
      "long",
      "scaled_float"
    ],
    "conflicts": [
      {
        "file": "system/data_stream/diskio/fields/fields.yml",
        "line": 132,
        "type": "scaled_float"
      },
      {
        "file": "system/data_stream/diskio/fields/agent.yml",
        "line": 201,
        "type": "long"
      }
    ]
  }
]
cmacknz commented 1 year ago

The system integration in Metricbeat that pre-dated this uses long for both of these:

https://github.com/elastic/beats/blob/62979b558ccbd6b221dcb2268c5e73665ba653d9/metricbeat/module/system/diskio/_meta/fields.yml#L29-L42

The scaled fields came from https://github.com/elastic/integrations/pull/301, reading the issue description in https://github.com/elastic/beats/issues/19757 it seems like the intent was to standardize on those as the definition.

However it looks like these fields in ECS ended up as long https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-disk-read-bytes.

Probably long is the one to keep, since the fields represent bytes read/written since the last metric collection period I doubt this will matter. I can't image us having to represent 2^63 - 1 bytes (~8192 Petabytes) since the last collection period.

andrewkroh commented 1 year ago

I didn't realize those were in ECS. So I agree, long is the way to go.

botelastic[bot] commented 2 weeks ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!