elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
187 stars 398 forks source link

[MySQL] [Bug] Record the schema name additionally to prevent data loss and digest_text identification. #9239

Open agithomas opened 4 months ago

agithomas commented 4 months ago

MySQL performance datastream make use of the following query.

          SELECT digest_text, count_star, avg_timer_wait, max_timer_wait, last_seen, quantile_95
            FROM performance_schema.events_statements_summary_by_digest
            ORDER BY avg_timer_wait DESC
            LIMIT 10

As part of the testing, it is noticed that when the same SQL statements are run across multiple database schema, especially when the database traffic is less, will lead to duplicate values for digest_text .

Reference : https://dev.mysql.com/doc/refman/8.0/en/performance-schema-statement-summary-tables.html

events_statements_summary_by_digest has SCHEMA_NAME and DIGEST columns. Each row summarizes events per schema and digest value. (The DIGEST_TEXT column contains the corresponding normalized statement digest text, but is neither a grouping nor a summary column.

After TSDB enablement, this would lead to data loss as there exists no schema identifier.

To fix the issue, following changes are proposed

  1. Modify the metricbeat code to support schema
    SELECT schema, digest_text, count_star, avg_timer_wait, max_timer_wait, last_seen, quantile_95 FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10

  2. Modify the integration ingest pipeline to use schema additionally for fingerprint computation

- fingerprint:
    fields: ["mysql.performance.events_statements.query"]
    target_field: mysql.performance.events_statements.query_id
    ignore_failure: true
    ignore_missing: true
harnish-elastic commented 4 months ago

Raised a PR to update query in MySQL metricbeat module https://github.com/elastic/beats/pull/38363

harnish-elastic commented 1 month ago

Once the beats has this PR changes released, need to update the integration package.