[SecuritySolution] List backing indices as datastreams in data quality dashboard

angorayc commented 8 months ago

Describe the feature:

Current data quality dashboard lists all the indices that matches the data view. However, these information of backing indices wouldn't be very helpful to users, it'd be better to list the datastreams they belong.

Screenshot 2024-03-20 at 12 43 30

elasticmachine commented 8 months ago

Pinging @elastic/security-threat-hunting (Team:Threat Hunting)

elasticmachine commented 8 months ago

Pinging @elastic/security-threat-hunting-explore (Team:Threat Hunting:Explore)

elasticmachine commented 8 months ago

Pinging @elastic/security-solution (Team: SecuritySolution)

bytebilly commented 8 months ago

+1 to discuss this change, as indices that support a data stream should generally not be directly managed by end users, that likely interact with the data stream itself

YulNaumenko commented 8 months ago

@dhru42 and @MikePaquette do you have any thoughts about this proposal?

dhru42 commented 8 months ago

I think datastream info should be provided with indices info in the new STATS API

angorayc commented 8 months ago

From my perspective, what we want to do in this issue is to update the data quality dashboard UI and group backing indices as datastreams. After taking to the team, here are some questions we have:

Have we got enough information from the existing APIs?

This might be doable on serverless Kibana, as the api we are going to use here provides us enough information about which backing indices form a datastream.

e.g.:

{
    "_shards": {
        "total": 4,
        "successful": 2,
        "failed": 0
    },
    "indices": [
      {
          "name": ".ds-my-datastream-03-02-2024-00001",
          "num_docs": 3,
          "size_in_bytes": 15785,
          "datastream": "my-datastream" // ---> ds name
      },
      {
        "name": "my-index-000001",
        "num_docs": 2,
        "size_in_bytes": 11462
      },
    ],
    "datastreams": [
      {
        "name": "my-datastream",
        "num_docs": 6,
        "size_in_bytes": 31752
      }
    ]
}

In traditional Kibana, we use GET auditbeat-*/_stats api to retrieve all the indices that matches the given dataview. In this api, it doesn't seem to indicate which datastream (name) each index belongs to. @bytebilly have you got any idea is there any api that could provide us the relationship between a datastream and its backing indices?

Whose mappings are we going to check? When checking the incompatible field types, we use the mapping of each index to compare with ECS mappings. When we grouping backing indices as a datastream, does it mean we are comparing the mapping templates of the datastream with ECS? Would this information accurate? Is there any chances that some mappings of the backing indices could be updated and rollover?

bytebilly commented 8 months ago

have you got any idea is there any api that could provide us the relationship between a datastream and its backing indices?

Data streams have their own Data stream stats API, but it returns only the number of backing indices (not their names).

Note that data streams are not returned by the "classic" Index Stats API when it's used in "bulk" mode, even if their backing indices are.

I think that the only possible way to correlate a data stream and its backing indices is to rely on the data stream naming convention:

Each data stream tracks its generation: a six-digit, zero-padded integer starting at 000001.

When a backing index is created, the index is named using the following convention: .ds-<data-stream>-<yyyy.MM.dd>-<generation>

The obvious question now is why not adding the relationship in the API, like we will have for the new Serverless one. I guess that the main issue could be backward compatibility, but I'd like to get more details from more informed people here.

bytebilly commented 8 months ago

Not sure how it is achieved, but it looks like Kibana relates backing indices to their data stream:

semd commented 8 months ago

fwiw, it is possible to know the indices backing a data stream using the index API (docs)

For example GET logs-* (?feature=aliases can be used to omit mappings and settings information) returns all indices (from a data stream or not) and provides the data_stream entry as well.

{
[...]
  ".ds-logs-winlog.winlog-default-2023.10.11-000002": {
    "aliases": {},
    "mappings": {},
    "settings": {},
    "data_stream": "logs-winlog.winlog-default"
  },
  "logs-cloud_security_posture.findings_latest-default": {
    "aliases": {},
    "mappings": {},
    "settings": {}
  },
[...]
}

elastic / kibana

[SecuritySolution] List backing indices as datastreams in data quality dashboard #179050