elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.42k stars 24.87k forks source link

Get snapshot API returns duplicate information for `.fleet-actions-results` system data stream #111146

Open romain-chanu opened 4 months ago

romain-chanu commented 4 months ago

Elasticsearch Version

8.14.3

Installed Plugins

No response

Java Version

bundled

OS Version

N.A

Problem Description

Get snapshot API returns duplicate information for .fleet-actions-results system data stream.

c.f JSON response below:

{
  "total": 1,
  "remaining": 0,
  "snapshots": [
    {
      "include_global_state": true,
      "uuid": "PSIEWRmOR1m5EzTiCZJANg",
      "repository": "found-snapshots",
      "duration_in_millis": 8821,
      "start_time": "2024-07-22T02:29:59.815Z",
      "shards": {
        "successful": 71,
        "failed": 0,
        "total": 71
      },
      "version_id": 8505000,
      "end_time_in_millis": 1721615408636,
      "state": "SUCCESS",
      "version": "8.14.0-8.14.2",
      "snapshot": "cloud-snapshot-2024.07.22-wpnxv4hmqzqb3vewczcjvq",
      "end_time": "2024-07-22T02:30:08.636Z",
      "feature_states": [
        {
          "indices": [
            ".security-tokens-7",
            ".security-7",
            ".security-profile-8"
          ],
          "feature_name": "security"
        },
        {
          "indices": [
            ".kibana_8.14.3_001",
            ".kibana_security_solution_8.14.3_001",
            ".apm-custom-link",
            ".kibana_ingest_8.14.3_001",
            ".apm-agent-configuration",
            ".kibana_analytics_8.14.3_001",
            ".kibana_security_session_1",
            ".kibana_alerting_cases_8.14.3_001",
            ".kibana_task_manager_8.14.3_001"
          ],
          "feature_name": "kibana"
        },
        {
          "indices": [
            ".geoip_databases"
          ],
          "feature_name": "geoip"
        },
        {
          "indices": [
            ".transform-internal-007"
          ],
          "feature_name": "transform"
        },
        {
          "indices": [
            ".fleet-agents-7",
            ".fleet-enrollment-api-keys-7",
            ".fleet-actions-7",
            ".fleet-policies-7",
            ".fleet-servers-7",
            ".fleet-policies-leader-7"
          ],
          "feature_name": "fleet"
        }
      ],
      "indices": [
        ".ds-metrics-system.memory-default-2024.07.21-000001",
        ".apm-agent-configuration",
        ".ds-metrics-system.socket_summary-default-2024.07.21-000001",
        ".ds-metrics-elastic_agent.metricbeat-default-2024.07.21-000001",
        ".ds-logs-osquery_manager.result-default-2024.07.21-000001",
        ".kibana_task_manager_8.14.3_001",
        ".fleet-servers-7",
        ".kibana_ingest_8.14.3_001",
        ".ds-ilm-history-7-2024.07.21-000001",
        ".internal.alerts-ml.anomaly-detection-health.alerts-default-000001",
        ".slo-observability.summary-v3.2",
        ".ds-metrics-elastic_agent.elastic_agent-default-2024.07.21-000001",
        ".internal.alerts-observability.metrics.alerts-default-000001",
        ".internal.alerts-ml.anomaly-detection.alerts-default-000001",
        ".internal.alerts-security.alerts-default-000001",
        ".apm-source-map",
        ".logs-osquery_manager.action.responses-default",
        ".kibana_security_solution_8.14.3_001",
        ".ds-.kibana-event-log-ds-2024.07.21-000001",
        ".ds-metrics-fleet_server.agent_status-default-2024.07.21-000001",
        ".fleet-actions-7",
        ".fleet-enrollment-api-keys-7",
        ".internal.alerts-observability.apm.alerts-default-000001",
        ".ds-metrics-fleet_server.agent_versions-default-2024.07.21-000001",
        ".ds-logs-elastic_agent.metricbeat-default-2024.07.21-000001",
        ".kibana-observability-ai-assistant-conversations-000001",
        ".internal.alerts-observability.threshold.alerts-default-000001",
        ".kibana_alerting_cases_8.14.3_001",
        ".fleet-policies-leader-7",
        ".ds-metrics-system.network-default-2024.07.21-000001",
        ".ds-logs-system.syslog-default-2024.07.21-000001",
        ".ds-.slm-history-7-2024.07.21-000001",
        ".logs-osquery_manager.actions-default",
        ".geoip_databases",
        ".ds-metrics-system.load-default-2024.07.21-000001",
        ".ds-.logs-deprecation.elasticsearch-default-2024.07.21-000001",
        ".internal.alerts-observability.slo.alerts-default-000001",
        ".ds-metrics-system.cpu-default-2024.07.21-000001",
        ".internal.alerts-observability.uptime.alerts-default-000001",
        ".fleet-agents-7",
        ".security-7",
        ".ds-metrics-elastic_agent.osquerybeat-default-2024.07.21-000001",
        ".transform-notifications-000002",
        ".kibana_security_session_1",
        ".internal.alerts-stack.alerts-default-000001",
        ".internal.alerts-default.alerts-default-000001",
        ".ds-metrics-elastic_agent.filebeat-default-2024.07.21-000001",
        ".slo-observability.summary-v3.2.temp",
        ".ds-metrics-system.uptime-default-2024.07.21-000001",
        ".security-tokens-7",
        ".ds-metrics-elastic_agent.filebeat_input-default-2024.07.21-000001",
        ".slo-observability.sli-v3.2",
        ".ds-logs-elastic_agent.filebeat-default-2024.07.21-000001",
        ".fleet-policies-7",
        ".apm-custom-link",
        ".kibana_analytics_8.14.3_001",
        ".internal.alerts-transform.health.alerts-default-000001",
        ".ds-logs-elastic_agent.osquerybeat-default-2024.07.21-000001",
        ".ds-logs-system.auth-default-2024.07.21-000001",
        ".ds-metrics-system.process-default-2024.07.21-000001",
        ".ds-metrics-system.process.summary-default-2024.07.21-000001",
        ".ds-metrics-system.fsstat-default-2024.07.21-000001",
        ".security-profile-8",
        ".ds-logs-elastic_agent-default-2024.07.21-000001",
        ".internal.alerts-observability.logs.alerts-default-000001",
        ".ds-metrics-system.diskio-default-2024.07.21-000001",
        ".ds-metrics-system.filesystem-default-2024.07.21-000001",
        ".transform-internal-007",
        ".ds-.fleet-actions-results-2024.07.21-000001",
        ".kibana-observability-ai-assistant-kb-000001",
        ".kibana_8.14.3_001"
      ],
      "failures": [],
      "data_streams": [
        ".fleet-actions-results", <----- FIRST ENTRY
        ".logs-deprecation.elasticsearch-default",
        "ilm-history-7",
        "logs-elastic_agent.osquerybeat-default",
        "metrics-system.process-default",
        "metrics-elastic_agent.filebeat-default",
        "metrics-elastic_agent.metricbeat-default",
        "metrics-elastic_agent.filebeat_input-default",
        "metrics-system.process.summary-default",
        "metrics-system.network-default",
        "logs-system.auth-default",
        "metrics-system.load-default",
        "logs-elastic_agent.metricbeat-default",
        "logs-osquery_manager.result-default",
        "metrics-system.fsstat-default",
        "logs-system.syslog-default",
        "metrics-system.cpu-default",
        "logs-elastic_agent.filebeat-default",
        "metrics-system.memory-default",
        "metrics-elastic_agent.elastic_agent-default",
        "logs-elastic_agent-default",
        "metrics-elastic_agent.osquerybeat-default",
        ".kibana-event-log-ds",
        ".slm-history-7",
        "metrics-system.diskio-default",
        "metrics-system.filesystem-default",
        "metrics-fleet_server.agent_status-default",
        "metrics-system.uptime-default",
        "metrics-fleet_server.agent_versions-default",
        "metrics-system.socket_summary-default",
        ".fleet-actions-results" <----- SECOND ENTRY
      ],
      "start_time_in_millis": 1721615399815,
      "metadata": {
        "policy": "cloud-snapshot-policy"
      }
    }
  ]
}

Could be related to https://github.com/elastic/elasticsearch/issues/89261 and https://github.com/elastic/elasticsearch/pull/71667

Steps to Reproduce

Logs (if relevant)

No response

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-data-management (Team:Data Management)

PeteGillinElastic commented 1 month ago

It doesn't look like the duplicates are due to a bug in the get snapshot API. It looks like the duplicate info is persisted in the repository.

I started ES locally with

gradlew run -Dtests.es.path.repo=/Users/petegillin/my-snapshot-repository -Dtests.es.xpack.security.enabled=false

I triggered the creation of that system data stream with

curl  -H "Content-Type: application/json; charset=UTF-8" -H "X-elastic-product-origin: fleet" -XPUT "http://localhost:9200/_data_stream/.fleet-actions-results"

I created a local FS snapshot repository with

curl  -H "Content-Type: application/json; charset=UTF-8" -XPUT "http://localhost:9200/_snapshot/my_fs_backup?pretty=true" -d'{
  "type": "fs",
  "settings": {
    "location": "/Users/petegillin/my-snapshot-repository"
  }
}
'

I triggered a snapshot with

curl  -H "Content-Type: application/json; charset=UTF-8" -XPUT "http://localhost:9200/_snapshot/my_fs_backup/my_snapshot?wait_for_completion=true&pretty=true"

N.B. You can already see the duplicate data stream names in the response:

    "data_streams" : [
      ".fleet-actions-results",
      "ilm-history-7",
      ".fleet-actions-results"
    ],

I called the get stapshot API with

curl  -H "Content-Type: application/json; charset=UTF-8" -XGET "http://localhost:9200/_snapshot/_all/_all?pretty=true"

You can see the duplicate data stream names in the response, as above.

Now, if we debug this, we see that at this call stack:

at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.snapshots.SnapshotInfo.fromXContentInternal(SnapshotInfo.java:767)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.deserialize(ChecksumBlobStoreFormat.java:183)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.read(ChecksumBlobStoreFormat.java:129)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$getOneSnapshotInfo$27(BlobStoreRepository.java:2005)

we are doing

dataStreams = XContentParserUtils.parseList(parser, XContentParser::text);

and the value of dataStreams that we are deserializing from the blob has the duplicate:

[.fleet-actions-results, ilm-history-7, .fleet-actions-results]
PeteGillinElastic commented 1 month ago

Since it looks like this is down to what's stored in the repository rather than a hallucination of the GET API, I think that @elastic/es-distributed should take a look.

I'm going to change the labels accordingly — please change back if you disagree.

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-distributed (Team:Distributed)

ywangd commented 1 month ago

Yeah I can confirm this is a bug in creating snapshot. In the following code, we concate the resolved datastream names with system data stream names without duplicating.

https://github.com/elastic/elasticsearch/blob/92ecd36a031de094cda642f14000bea545b01740/server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java#L4147-L4149

Fortunately, this bug should not have significant consequences other than the duplicated output in the API response.