Open romain-chanu opened 4 months ago
Pinging @elastic/es-data-management (Team:Data Management)
It doesn't look like the duplicates are due to a bug in the get snapshot API. It looks like the duplicate info is persisted in the repository.
I started ES locally with
gradlew run -Dtests.es.path.repo=/Users/petegillin/my-snapshot-repository -Dtests.es.xpack.security.enabled=false
I triggered the creation of that system data stream with
curl -H "Content-Type: application/json; charset=UTF-8" -H "X-elastic-product-origin: fleet" -XPUT "http://localhost:9200/_data_stream/.fleet-actions-results"
I created a local FS snapshot repository with
curl -H "Content-Type: application/json; charset=UTF-8" -XPUT "http://localhost:9200/_snapshot/my_fs_backup?pretty=true" -d'{
"type": "fs",
"settings": {
"location": "/Users/petegillin/my-snapshot-repository"
}
}
'
I triggered a snapshot with
curl -H "Content-Type: application/json; charset=UTF-8" -XPUT "http://localhost:9200/_snapshot/my_fs_backup/my_snapshot?wait_for_completion=true&pretty=true"
N.B. You can already see the duplicate data stream names in the response:
"data_streams" : [
".fleet-actions-results",
"ilm-history-7",
".fleet-actions-results"
],
I called the get stapshot API with
curl -H "Content-Type: application/json; charset=UTF-8" -XGET "http://localhost:9200/_snapshot/_all/_all?pretty=true"
You can see the duplicate data stream names in the response, as above.
Now, if we debug this, we see that at this call stack:
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.snapshots.SnapshotInfo.fromXContentInternal(SnapshotInfo.java:767)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.deserialize(ChecksumBlobStoreFormat.java:183)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.read(ChecksumBlobStoreFormat.java:129)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$getOneSnapshotInfo$27(BlobStoreRepository.java:2005)
we are doing
dataStreams = XContentParserUtils.parseList(parser, XContentParser::text);
and the value of dataStreams
that we are deserializing from the blob has the duplicate:
[.fleet-actions-results, ilm-history-7, .fleet-actions-results]
Since it looks like this is down to what's stored in the repository rather than a hallucination of the GET API, I think that @elastic/es-distributed should take a look.
I'm going to change the labels accordingly — please change back if you disagree.
Pinging @elastic/es-distributed (Team:Distributed)
Yeah I can confirm this is a bug in creating snapshot. In the following code, we concate the resolved datastream names with system data stream names without duplicating.
Fortunately, this bug should not have significant consequences other than the duplicated output in the API response.
Elasticsearch Version
8.14.3
Installed Plugins
No response
Java Version
bundled
OS Version
N.A
Problem Description
Get snapshot API returns duplicate information for
.fleet-actions-results
system data stream.c.f JSON response below:
Could be related to https://github.com/elastic/elasticsearch/issues/89261 and https://github.com/elastic/elasticsearch/pull/71667
Steps to Reproduce
.fleet-actions-results
system data stream is created with the respective backing indices.fleet-actions-results
system data stream in Get snapshot API outputLogs (if relevant)
No response