Open angorayc opened 8 months ago
Pinging @elastic/security-threat-hunting (Team:Threat Hunting)
Pinging @elastic/security-threat-hunting-explore (Team:Threat Hunting:Explore)
Pinging @elastic/security-solution (Team: SecuritySolution)
+1 to discuss this change, as indices that support a data stream should generally not be directly managed by end users, that likely interact with the data stream itself
@dhru42 and @MikePaquette do you have any thoughts about this proposal?
I think datastream info should be provided with indices info in the new STATS API
From my perspective, what we want to do in this issue is to update the data quality dashboard UI and group backing indices as datastreams. After taking to the team, here are some questions we have:
serverless Kibana
, as the api we are going to use here provides us enough information about which backing indices form a datastream.e.g.:
{
"_shards": {
"total": 4,
"successful": 2,
"failed": 0
},
"indices": [
{
"name": ".ds-my-datastream-03-02-2024-00001",
"num_docs": 3,
"size_in_bytes": 15785,
"datastream": "my-datastream" // ---> ds name
},
{
"name": "my-index-000001",
"num_docs": 2,
"size_in_bytes": 11462
},
],
"datastreams": [
{
"name": "my-datastream",
"num_docs": 6,
"size_in_bytes": 31752
}
]
}
GET auditbeat-*/_stats
api to retrieve all the indices that matches the given dataview. In this api, it doesn't seem to indicate which datastream (name) each index belongs to.
@bytebilly have you got any idea is there any api that could provide us the relationship between a datastream and its backing indices?have you got any idea is there any api that could provide us the relationship between a datastream and its backing indices?
Data streams have their own Data stream stats API, but it returns only the number of backing indices (not their names).
Note that data streams are not returned by the "classic" Index Stats API when it's used in "bulk" mode, even if their backing indices are.
I think that the only possible way to correlate a data stream and its backing indices is to rely on the data stream naming convention:
Each data stream tracks its generation: a six-digit, zero-padded integer starting at 000001.
When a backing index is created, the index is named using the following convention:
.ds-<data-stream>-<yyyy.MM.dd>-<generation>
The obvious question now is why not adding the relationship in the API, like we will have for the new Serverless one. I guess that the main issue could be backward compatibility, but I'd like to get more details from more informed people here.
Not sure how it is achieved, but it looks like Kibana relates backing indices to their data stream:
fwiw, it is possible to know the indices backing a data stream using the index API (docs)
For example GET logs-*
(?feature=aliases
can be used to omit mappings and settings information) returns all indices (from a data stream or not) and provides the data_stream
entry as well.
{
[...]
".ds-logs-winlog.winlog-default-2023.10.11-000002": {
"aliases": {},
"mappings": {},
"settings": {},
"data_stream": "logs-winlog.winlog-default"
},
"logs-cloud_security_posture.findings_latest-default": {
"aliases": {},
"mappings": {},
"settings": {}
},
[...]
}
Describe the feature:
Current data quality dashboard lists all the indices that matches the data view. However, these information of backing indices wouldn't be very helpful to users, it'd be better to list the datastreams they belong.