Common Fields for Container Inventory Schema

jsoriano commented 3 years ago

This issue tracks work related to the definition of common fields for container inventory schema.

The output will be a set of recommended or required fields to be added to any event related to containers.

The purpose of these fields is to have a minimal set of valuable data that can be used for inventory. The focus will be in metadata and metrics fields.

Integrations possibly affected:

Docker
Kubernetes
Cloudfoundry
Nomad (WIP #16853)
Fargate (WIP #22034)
Cgroups-related data in system module?
Container metrics in cloud-specific monitors (CloudWatch/Stackdriver/Azure)?

Related issues

19757 - Host Inventory Schema

elasticmachine commented 3 years ago

Pinging @elastic/integrations-platforms (Team:Platforms)

exekias commented 3 years ago

We may also want to check what Cloudwatch/Stackdriver/Azure monitor provide ootb

kaiyan-sheng commented 3 years ago

@jsoriano @ChrsMark Question about docker network metrics: I see we have both docker.network.in.bytes and docker.network.inbound.bytes. My understanding is docker.network.in.bytes is a gauge and docker.network.inbound.bytes is a counter?

Also do you think it's useful to calculate one value container.network.ingress.bytes to represent an aggregated value across all network interfaces?

jsoriano commented 3 years ago

My understanding is docker.network.in.bytes is a gauge and docker.network.inbound.bytes is a counter?

Yes, docker.network.in.bytes are actually bytes per second during the last collection period. Calculated using current and previous docker.network.inbound.bytes. The idea was to deprecate docker.network.in.bytes, but this could be reconsidered.

Also do you think it's useful to calculate one value container.network.ingress.bytes to represent an aggregated value across all network interfaces?

Yes, this can be interesting, specially for inventory UIs.

kaiyan-sheng commented 3 years ago

@simianhacker Does UI do any aggregation right now for docker network metrics?

sorantis commented 3 years ago

@neptunian FYI

simianhacker commented 3 years ago

@kaiyan-sheng Yes... we use docker.network.inbound.bytes and docker.network.outbound.bytes and we use them as counters.

kaiyan-sheng commented 3 years ago

@simianhacker Thanks! Is Kibana doing aggregation across all network interfaces for these two counters?

simianhacker commented 3 years ago

Yes, we take the derivative of the max of each interface and sum the rates together.

kaiyan-sheng commented 3 years ago

Proposing new container fields to add into ECS:

container.cpu.usage: Percent CPU normalized by the number of CPU cores and it ranges from 0 to 1. (same as docker.cpu.total.norm.pct)
container.memory.usage: Memory usage percentage. (same as docker.memory.usage.pct)
container.disk.read.bytes: Bytes read during this collection period. (derived from docker.diskio.read.bytes)
container.disk.write.bytes: Bytes written during this collection period. (derived from docker.diskio.read.bytes)
container.network.ingress.bytes: The number of bytes (gauge) received on all network interfaces(aggregated) by the container in a given period of time. (derived from docker.network.inbound.bytes)
container.network.egress.bytes: The number of bytes (gauge) sent out on all network interfaces(aggregated) by the container in a given period of time. (derived from docker.network.outbound.bytes)

@simianhacker Will container.network.ingress.bytes and container.network.egress.bytes be useful to the UI?

@ChrsMark @jsoriano @sorantis I would like to hear your opinion on this before starting RFC in ECS. Thanks!!

sorantis commented 3 years ago

Aligning container and hosts ECS metric fields sounds like a good start. I'm wondering if it's possible extend the list in a generic way with

container.state (up/down)
container.uptime

Question about container.memory.usage. What memory metric will this field be derived from? For both kubernetes and docker we collect memory usage as well as memory rss metrics. Based on the proposed mapping we will only be exposing memory usage at the ECS level. Should we also consider adding memory rss to the list? Both metrics are reported by our Kubernetes/Docker/cgroup integrations.

jsoriano commented 3 years ago

I would like to hear your opinion on this before starting RFC in ECS.

LGTM, thanks!

I'm wondering if it's possible extend the list in a generic way with

container.state (up/down)

container.uptime

It can be interesting to add them, but we would need to define them well. For the state we would need to define valid values, that can be different between platforms, and if healthchecks should be considered. Uptime can be also different in different platforms, perhaps we should report creation and/or start times instead.

Question about container.memory.usage. What memory metric will this field be derived from? For both kubernetes and docker we collect memory usage as well as memory rss metrics. Based on the proposed mapping we will only be exposing memory usage at the ECS level. Should we also consider adding memory rss to the list? Both metrics are reported by our Kubernetes/Docker/cgroup integrations.

As these metrics are going to be used for UIs and inventory purposes I think we should keep it simple and report a single value for memory usage. I guess that other more specific metrics will still be available in reported events (depending on the availability in the platform), so users can still check them if needed.

Regarding the memory metric to derive the common field from, I think it should be derived from the same metric the platform uses to enforce memory limits. This way what users see is consistent with how the platform behaves. For example in https://github.com/elastic/beats/pull/25428 we saw that the only metric kubernetes reports for Windows containers is workingSetBytes, and we decided to use this to calculate the memory percentage usage.

sorantis commented 3 years ago

+1 on well defining the state/update metrics before adding them to ECS.

The reason I was asking about including RSS is because I'm seeing it being used as an equally important metric for monitoring container memory. We can review this once we'll have more customer feedback.

ChrsMark commented 3 years ago

@kaiyan-sheng @sorantis I guess this will help us solving https://github.com/elastic/kibana/issues/100229, right?

kaiyan-sheng commented 3 years ago

@ChrsMark Yes I think this is a perfect use case for defining and adopting inventory schema.

kaiyan-sheng commented 2 years ago

Hi @akshay-saraswat and @ChrsMark, here are the two issues I created: Testing new container fields in docker: https://github.com/elastic/integrations/issues/2119 Testing new container fields in kubernetes: https://github.com/elastic/integrations/issues/2120

ChrsMark commented 2 years ago

@MichaelKatsoulis since you are planning to working on this do you think we can take the ownership? @jsoriano @kaiyan-sheng any objections on this?

jsoriano commented 2 years ago

No objectios :slightly_smiling_face: Thanks a lot!

simianhacker commented 2 years ago

Will container.network.ingress.bytes and container.network.egress.bytes be useful to the UI?

Yes

MichaelKatsoulis commented 2 years ago

Continuing the discussion on this.
Currently we have suggested the following ECS fields

container.cpu.usage
container.memory.usage
container.network.ingress.bytes
container.network.egress.bytes
container.disk.read.bytes
container.disk.write.bytes

After looking at Metrics UI Inventory page , I can see that in the dropdown list there are kubernetes pods hardcoded where fields coming from kubernetes.pod.* are used and also Docker Containers. In Docker containers, fields from docker.* are used which are populated by the docker module. More specifically the fields used are:

docker.cpu.total.pct
docker.memory.usage.pct
docker.network.inbound.bytes
docker.network.interface
docker.network.outbound.bytes
docker.diskio.read.bytes
docker.diskio.write.bytes
docker.diskio.read.ops
docker.diskio.write.ops

Comparing the cpu percentages of docker containers and kubernetes pods inventory I noticed wide differences for the same containers. Investigating this I figured out that in case of kubernetes pods, we are using a normalised (per node cpus) value for the pod cpu percentage (kubernetes.pod.cpu.usage.node.pct). While for docker the non normalised values are used(docker.cpu.total.pct), although we also calculate the docker.cpu.total.norm.pct. The difference is that the non normalised equals to normalised * node_cpus.

The best step forward would be to make the docker containers inventory generic. We could rename it into Containers and use only ECS fields.

Those fields will be populated by either docker in case docker runtime is used, kubelet in case we are on k8s, containerd in case containerd runtime is used.

As different modules can populated same fields at the same time it is vital all the different modules (currently kubernetes and docker. In the future containerd as well) will report same values for those fields.

So I suggest that container.cpu.usage ECS field to be renamed into container.cpu.usage.pct to be more clear what it is and it will be the normalised cpu percentage. Also it would be nice to add some more ECS fields so we can use the Docker Containers inventory only with ECS.

Those additional fields are:

container.memory.usage.pct
container.disk.write.ops
container.disk.read.ops
container.network.interface

ChrsMark commented 2 years ago

Good job in breaking this down @MichaelKatsoulis !

Regarding the new fields you propose, can you also share the types for them? We can upvote for them in this issue and then I guess we can go ahead and open a PR to ECS.

Regarding the views. I find your approach valid. The Docker view could be renamed to Containers and use container ECS fields, which will be populated by docker and containerd modules mainly. In addition this view can be populated by k8s module also but we need to think if k8s module will be reporting kubernetes.* fields or container.* ECS fields or both. I think we would need both since the Kubernetes view should be populated only when we have actual k8s metrics and not just metrics about containers which could be coming from only the runtime. Also what will happen if both k8s and docker modules are running, which one would populate the Containers view?

@jasonrhodes I think that the UI team need to pair with us on this decision making. Anyone available to work on this?

MichaelKatsoulis commented 2 years ago

Regarding the new fields you propose, can you also share the types for them? We can upvote for them in this issue and then I guess we can go ahead and open a PR to ECS.

@ChrsMark, The types of the new fields are:

container.memory.usage.pct        type: scaled_float        format: percent
container.disk.write.ops          type: long
container.disk.read.ops.          type: long
container.network.interface       type: keyword

and the updated container.cpu.usage.pct is type: scaled_float

I also believe that k8s module should populate both fields (kubernetes.* and container.* ). The Kubernetes view will remain as is and use the values from kubernetes.*. Regarding what will happen in case both docker and k8s module populate the same filed, I think we will be ok as long as the values match. Or is there any way in Kibana to specify priorities like from which dataset the fields are coming from? That way if docker or containerd modules are running the view will use them . If not it will use the ones from kubernetes. Maybe @jasonrhodes could answer this.

elastic / beats

Common Fields for Container Inventory Schema #22179

Related issues

19757 - Host Inventory Schema