elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
108 stars 4.93k forks source link

[QASource] Test collection of system process metrics under docker #39900

Open fearful-symmetry opened 5 months ago

fearful-symmetry commented 5 months ago

In the past months, we've run into a considerable amount of bugs when it comes to monitoring host metrics while running under docker. I'm making these test steps in the hope that this can be a regular set of tests that are run with every release.

Steps to test

1) Run metricbeat via docker with the following:

 docker run --label co.elastic.metrics/module=system \                                                                                                                         
--mount type=bind,source=/proc,target=/hostfs/proc,readonly \
--mount type=bind,source=/sys/fs/cgroup,target=/hostfs/sys/fs/cgroup,readonly \
--mount type=bind,source=/,target=/hostfs,readonly \
--mount type=bind,source=/var/run/dbus/system_bus_socket,target=/hostfs/var/run/dbus/system_bus_socket,readonly \
--env DBUS_SYSTEM_BUS_ADDRESS='unix:path=/hostfs/var/run/dbus/system_bus_socket' \
--net=host --cgroupns=host docker.elastic.co/beats/metricbeat:VERSION_TO_TEST metricbeat -e -E output.elasticsearch.hosts='[ES_ENDPOINT]' -d '*'

2) In elasticsearch, ensure that there are documents with metricset.name matching process 3) in the debug logs, ensure that there are no log lines that contain the strings:

Test Targets

This should be run under docker on linux, and preferably tested across a range of linux distros from our support matrix, at least:

elasticmachine commented 5 months ago

Pinging @elastic/fleet-qasource-external (Team:Fleet-QA)

amolnater-qasource commented 5 months ago

Hi @fearful-symmetry

We have tested this feature on latest 8.15.0 SNAPSHOT kibana cloud environment and had below observations:

Observation Table:

S.no. HostOS Data under metricbeat-* Data under metricbeat-* without –cgroupns=host Non fatal error fetching PID some info Error fetching PID info for GetInfoForPid:
1 Ubuntu 16.04 Available Available No Errors observed No Errors observed No Errors observed
2 Ubuntu 20.04 Available Available No Errors observed No Errors observed No Errors observed
3 Ubuntu 24.04 Available Available No Errors observed No Errors observed No Errors observed
4 Rhel 7 AWS Template not Working AWS Template not Working NA NA NA
5 Rhel 8 Available Available No Errors observed No Errors observed No Errors observed
6 Rhel 9 Available Available No Errors observed No Errors observed No Errors observed

Artifact used: docker.elastic.co/beats/metricbeat:8.15.0-ee48b214-SNAPSHOT metricbeat

Further we were getting authentication errors so we have added authentication under the install command:

sudo docker run --label co.elastic.metrics/module=system \
--mount type=bind,source=/proc,target=/hostfs/proc,readonly \
--mount type=bind,source=/sys/fs/cgroup,target=/hostfs/sys/fs/cgroup,readonly \
--mount type=bind,source=/,target=/hostfs,readonly \
--mount type=bind,source=/var/run/dbus/system_bus_socket,target=/hostfs/var/run/dbus/system_bus_socket,readonly \
--env DBUS_SYSTEM_BUS_ADDRESS='unix:path=/hostfs/var/run/dbus/system_bus_socket' \
--net=host --cgroupns=host docker.elastic.co/beats/metricbeat:8.15.0-ee48b214-SNAPSHOT metricbeat -e -E output.elasticsearch.hosts='https://host-url:443' \
-E output.elasticsearch.username='elastic' \
-E output.elasticsearch.password='password' \
-d '*'
  1. In elasticsearch, ensure that there are documents with metricset.name matching process

For this we have tested metricbeat-* under Discover tab.

image

  1. in the debug logs, ensure that there are no log lines that contain the strings:

For this we have searched the CLI logs where metricbeat is running

image

Logs with cgroups: with cg.txt

Logs without cgroups: without cg.txt

Further, could you please share a working AWS- Rhel 7 template as the AWS-Rhel 7 templates we are using below errors are observed on running any install commands. image

Please let us know if we are missing anything here.

cc: @pierrehilbert

Thanks!!

fearful-symmetry commented 5 months ago

Looked at the logs, nothing seems suspicious.

@amolnater-qasource

Further, could you please share a working AWS- Rhel 7 template as the AWS-Rhel 7 templates we are using below errors are observed on running any install commands.

I've tested this myself entirely in local VMs, so I can't comment on any AWS-specific configs needed.

Based on the screenshot above, it looks like the elasticsearch check may be incorrect. We need to check for the presence of documents with metricset.name = process, but the screenshot above appears to show process.name=metricbeat

amolnater-qasource commented 5 months ago

Hi @fearful-symmetry

Thank you for the update, we have applied metricset.name : "process" under Discover tab.

Screen Captures: image

https://github.com/elastic/beats/assets/77374876/e73653c6-e7dd-4870-af17-dd36f9c74f33

Please let us know if we are still missing anything here.

Thanks!

fearful-symmetry commented 5 months ago

The data looks correct, I guess we can make do without RHEL 7 for now.

@amolnater-qasource purely out of curiosity, can you run docker version | grep "Version" on all of the different hosts you've tested and return the result? I'd like to know if we're getting an even spread of different docker versions across the VM. I suspect we're not, but I want to be sure.

amolnater-qasource commented 5 months ago

Hi @fearful-symmetry

Thank you for the update. We had docker version 20.10.7 on Ubuntu 16 and on all other OS's Rhel 8, Rhel 9, Ubuntu 20, Ubuntu 24 we had docker version 26.1.4.

Please let us know if anything else is required from our end.

Further for the regression testcases could you please confirm if we should create 1 testcase for any of the 1 linux version or we should create testcases for all the 5 OS's. tested.

Thanks!

fearful-symmetry commented 5 months ago

Yeah, I'm kind of tangentially worried about the versions of docker being used, since different docker engines could impact namespace settings, etc

Further for the regression testcases could you please confirm if we should create 1 testcase for any of the 1 linux version or we should create testcases for all the 5 OS's. tested.

@amolnater-qasource not sure what you mean? By "regression testcases" do you mean running this test in future?

amolnater-qasource commented 5 months ago

Yeah, I'm kind of tangentially worried about the versions of docker being used, since different docker engines could impact namespace settings, etc

Do you want us to test it with different docker versions or any specific versions?

@amolnater-qasource not sure what you mean? By "regression testcases" do you mean running this test in future?

Yes, as added in description the tests need to be run future, so we need to convert it into testcases in Fleet test suite. We just wanted to confirm if we should create testcases for just 1 platform or all platforms tested above?

Thanks!

fearful-symmetry commented 5 months ago

Yes, as added in description the tests need to be run future, so we need to convert it into testcases in Fleet test suite. We just wanted to confirm if we should create testcases for just 1 platform or all platforms tested above?

ah, alright. Yes, this should be run under multiple linuxes for future tests.

As far as Docker versions, I'm going to have to do some research and figure out if there's some particular docker engine changes we'd be interested in.

amolnater-qasource commented 4 months ago

Hi @fearful-symmetry

We have created 05 testcases under Testmo for this feature under Fleet test suite at links:

Please let us know if any other scenario needs to be added from our end.

Thanks!

amolnater-qasource commented 3 months ago

Hi Team,

We have executed 05 testcases under the Feature test run for the 8.15.0 release at the link:

Status:

PASS: 05

Build details: VERSION: 8.15.0 BC4 BUILD: 76261 COMMIT: 9d62937675e62265342e86d8f0db601dc75498b8 Artifact Link: docker.elastic.co/staging/metricbeat:8.15.0-a7432175

As the testing is completed on this feature, we are marking this as QA:Validated.

Please let us know if anything else is required from our end. Thanks