elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.78k stars 8.19k forks source link

[Stack Management][Index Management] Allocation Explain and Snapshot Restore #157662

Open stefnestor opened 1 year ago

stefnestor commented 1 year ago

👋 howdy, team -- I like Kibana's Stack Management ability to introspect Elasticsearch objects without having to figure out the backing Elasticsearch API's, thanks for building it!

Kibana's Index Management seems to majorly be built for every-day index admin tasks with some UI errors/warnings about when Elasticsearch has fallen off happy-path. One of the most frequent troubleshooting investigations raised to our team, Cluster status:{yellow,red}, seems to have to pivot towards DevTools or Elasticsearch API very quickly even if Kibana UI is accessible.

request

Since this recovery is so integral to ongoing database adoption, I imagine the conversation about putting some of this troubleshooting information into Stack Management has been discussed previously in Github or internally, but I'm not finding it, sorry, so filing it. (If it already exists, will you kindly link it to me and close here?)

Will your team kindly consider adding in Kibana UI content:

  1. [Index Management] IF index is status:{yellow,red} automatically pull an Allocation Explain report for index's shards based on descending priority {primary_N, replica_N}. Just the raw JSON similar to if user were to poll this endpoint, but maybe also linking to help me understand output examples.
  2. [Index Management] IF index is status:red show UI to snapshot restore only this index from the last successfully-captured-it snapshot with a warning kinda like "we'll restore last-successful snapshot of this index, but data gathered after may already be lost". This would automate Closing Index and running the snapshot restore command, then would show something like "shard is recovering, please monitor Cluster Health and CAT Recovery for its completion via DevTools".

example

I don't know how to induce a status:red index easily on Elastic Cloud, so showing status:yellow:

  1. Spin up one-node Elasticsearch cluster
  2. Under DevTools, create index:test with replicas
    PUT test 
    {"settings": {"index": {"number_of_shards": 2, "number_of_replicas": 2 } } }
  3. In Index Management, see index reports status:yellow
  4. [enhancement] Click into index, sub-tab allows you to see Allocation Explain content

    image
    >>>
    POST _cluster/allocation/explain
    {"index": "test", "shard": 0, "primary": "false" }
    
    <<<
    {
      "index": "test",
      "shard": 0,
      "primary": false,
      "current_state": "unassigned",
      "unassigned_info": {
        "reason": "INDEX_CREATED",
        "at": "2023-05-13T15:33:07.653Z",
        "last_allocation_status": "no_attempt"
      },
      "can_allocate": "no",
      "allocate_explanation": "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
      "node_allocation_decisions": [
        {
          "node_id": "JDBeiqrHSPmBkXJQfmdxWA",
          "node_name": "instance-0000000000",
          "transport_address": "10.42.6.175:19643",
          "node_attributes": {
            "data": "hot",
            "server_name": "instance-0000000000.6d904dfb393c4f03a8adcc8e43d89057",
            "instance_configuration": "gcp.es.datahot.n2.68x10x45",
            "region": "unknown-region",
            "availability_zone": "us-central1-c",
            "logical_availability_zone": "zone-0",
            "xpack.installed": "true"
          },
          "node_decision": "no",
          "weight_ranking": 1,
          "deciders": [
            {
              "decider": "same_shard",
              "decision": "NO",
              "explanation": "a copy of this shard is already allocated to this node [[test][0], node[JDBeiqrHSPmBkXJQfmdxWA], [P], s[STARTED], a[id=FuqB5MODTJeLF8EP61hEyw], failed_attempts[0]]"
            }
          ]
        }
      ]
    }
elasticmachine commented 1 year ago

Pinging @elastic/platform-deployment-management (Team:Deployment Management)

alisonelizabeth commented 1 year ago

Hi @stefnestor - thanks for opening this issue! This sounds like a helpful enhancement. We will take a look. Tagging @shubhaat on this as well.

stefnestor commented 9 months ago

Related https://github.com/elastic/kibana/issues/137368

elasticmachine commented 3 weeks ago

Pinging @elastic/kibana-management (Team:Kibana Management)