Open rodehoed opened 5 months ago
Hello ! Thanks for reporting this issue, would you mind sharing:
Thanks a lot in advance
Hi @paulcacheux ,
Sure np.
The config comes from datadog-agent configcheck:
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/container_image.d/conf.yaml.default
Config for instance ID: container_image:2ac6bde1700038e4
{}
~
Auto-discovery IDs:
* _container_image
===
=== container_lifecycle check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/container_lifecycle.d/conf.yaml.default
Config for instance ID: container_lifecycle:b628cf9ded5c9324
{}
~
Auto-discovery IDs:
* _container_lifecycle
===
=== cpu check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
Config for instance ID: cpu:e331d61ed1323219
{}
~
===
=== disk check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
Config for instance ID: disk:67cc0574430a16ba
use_mount: false
~
===
=== file_handle check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
Config for instance ID: file_handle:381b8b6ca58d37b0
{}
~
===
=== io check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
Config for instance ID: io:541b60d158de04a7
{}
~
===
=== load check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
Config for instance ID: load:bf7cea93fb3aa780
{}
~
===
=== memory check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
Config for instance ID: memory:3f1f6288b95b9979
{}
~
===
=== mysql check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/mysql.d/conf.yaml
Config for instance ID: mysql:75cd0f7a0853706d
options:
disable_innodb_metrics: false
extra_innodb_metrics: true
extra_performance_metrics: true
extra_status_metrics: true
galera_cluster: true
replication: 0
schema_size_metrics: false
pass: "********"
port: 3306
server: 127.0.0.1
user: datadog
~
===
=== network check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
Config for instance ID: network:4b0649b7e11f0772
{}
~
===
=== nginx check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/nginx.d/conf.yaml
Config for instance ID: nginx:3833f3b9ceb3e496
nginx_status_url: http://not-my-host/nginx-status
~
Log Config:
logs:
- path: bogus/access.log
service: staging.bogus.com
source: nginx
sourcecategory: http_web_access
type: file
===
=== ntp check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
Config for instance ID: ntp:3c427a42a70bbf8
{}
~
===
=== php_fpm check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/php_fpm.d/conf.yaml
Config for instance ID: php_fpm:5726203bab636eaa
http_host: bogus-host
ping_reply: pong
ping_url: http://127.0.0.1/ping
status_url: http://127.0.0.1/fpmstatus
use_fastcgi: false
~
===
=== telemetry check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/telemetry.d/conf.yaml.default
Config for instance ID: telemetry:4d459fc318a47aa4
{}
~
===
=== uptime check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
Config for instance ID: uptime:c72f390abdefdf1a
{}
~
===``
Could you share the following files if present:
/etc/datadog-agent/datadog.yaml
/etc/datadog-agent/system-probe.yaml
/etc/datadog-agent/security-agent.yaml
Thanks a lot !
sure:
### MANAGED BY PUPPET
---
api_key: xxxxxxxxxxxxxx
dd_url: ''
site: datadoghq.eu
cmd_port: 5001
hostname_fqdn: false
collect_ec2_tags: false
collect_gce_tags: false
confd_path: "/etc/datadog-agent/conf.d"
enable_metadata_collection: true
dogstatsd_port: 8125
dogstatsd_socket: ''
dogstatsd_non_local_traffic: false
log_file: "/var/log/datadog/agent.log"
log_level: info
tags: []
apm_config:
enabled: true
env: none
apm_non_local_traffic: false
process_config:
enabled: 'true'
scrub_args: true
custom_sensitive_words: []
logs_enabled: true
logs_config:
container_collect_all: false
The system-probe and security agent config are not active.
Hello, The latest agent version comes with a new telemetry that reads data from rpm. To see if this one is the culprit, could you please try to disable it by setting
enable_signing_metadata_collection: false
in your datadog.yaml
configuration and restart the Agent? Then fix the DB corruption and see if it stops this from happening?
Thanks in advance
Hi All,
As of today, this config is set. I will keep you posted.
Hi π Just a quick follow-up if you have any updates with the config option. Does the DB corruption still happens ? Thanks in advance
Hi @Pythyu
Well not any updates actually :-) I mean we don't have seen this message anymore the last weeks. So one might think the problem is "fixed".
Thanks you for all the answers π Could you contact our support so we can get more information about your environment through not github ? It would help us a lot to reproduce the issue and potentially test the bug fix. You can share the ticket support id here, we'll follow it up
Hi @rodehoed π Please let us know if you got in touch with our support π Thanks
Hi All,
Sorry for being late! I opened a ticket right now at DD with ticket id #689248
Description Ok i'm not 100% confident that is a Datadog issue, but it's the only clue I have right now. Since march 22th we see (10) servers with getting their RPM DB corrupted. The facts:
Fixing the DB corruption will not prevent it from happening again. We have servers which have had this corruption multiple times now.
Agent Environment The agent is running 7.52.0-1 on RHEL 8.9
Describe what happened: The RPM database get's corrupted and calling the rpm/dnf command shows:
Describe what you expected: Database not getting corrupted
Steps to reproduce the issue: Upgrading is enough, but don't know what triggers it.
Additional environment details (Operating System, Cloud provider, etc):