elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.93k stars 24.73k forks source link

Add error logs when preventing index creation because of low disk space #82617

Open sebelga opened 2 years ago

sebelga commented 2 years ago

While testing a 7.17 to 8.0.0 upgrade on prem, Kibana got stuck in a timeout loop during the saved object migration process.

These are the Kibana Logs running in loop

Kibana logs ``` [2022-01-14T12:17:58.295+00:00][INFO ][savedobjects-service] Starting saved objects migrations [2022-01-14T12:17:58.516+00:00][INFO ][savedobjects-service] [.kibana] INIT -> WAIT_FOR_YELLOW_SOURCE. took: 184ms. [2022-01-14T12:17:58.662+00:00][INFO ][savedobjects-service] [.kibana] WAIT_FOR_YELLOW_SOURCE -> CHECK_UNKNOWN_DOCUMENTS. took: 146ms. [2022-01-14T12:17:58.786+00:00][INFO ][savedobjects-service] [.kibana] CHECK_UNKNOWN_DOCUMENTS -> SET_SOURCE_WRITE_BLOCK. took: 124ms. [2022-01-14T12:17:58.787+00:00][INFO ][savedobjects-service] [.kibana_task_manager] INIT -> WAIT_FOR_YELLOW_SOURCE. took: 453ms. [2022-01-14T12:17:58.867+00:00][INFO ][savedobjects-service] [.kibana_task_manager] WAIT_FOR_YELLOW_SOURCE -> CHECK_UNKNOWN_DOCUMENTS. took: 80ms. [2022-01-14T12:17:58.905+00:00][INFO ][savedobjects-service] [.kibana_task_manager] CHECK_UNKNOWN_DOCUMENTS -> SET_SOURCE_WRITE_BLOCK. took: 38ms. [2022-01-14T12:17:59.287+00:00][INFO ][savedobjects-service] [.kibana] SET_SOURCE_WRITE_BLOCK -> CALCULATE_EXCLUDE_FILTERS. took: 501ms. [2022-01-14T12:17:59.331+00:00][INFO ][savedobjects-service] [.kibana] CALCULATE_EXCLUDE_FILTERS -> CREATE_REINDEX_TEMP. took: 44ms. [2022-01-14T12:17:59.396+00:00][INFO ][savedobjects-service] [.kibana_task_manager] SET_SOURCE_WRITE_BLOCK -> CALCULATE_EXCLUDE_FILTERS. took: 491ms. [2022-01-14T12:17:59.404+00:00][INFO ][savedobjects-service] [.kibana_task_manager] CALCULATE_EXCLUDE_FILTERS -> CREATE_REINDEX_TEMP. took: 8ms. [2022-01-14T12:17:59.409+00:00][INFO ][plugins.screenshotting.chromium] Browser executable: /Users/sebastienloix/Elastic-spike-testing/kibana-8.0.0-rc1/x-pack/plugins/screenshotting/chromium/headless_shell-darwin_x64/headless_shell [2022-01-14T12:18:01.121+00:00][INFO ][savedobjects-service] [.kibana] CREATE_REINDEX_TEMP -> REINDEX_SOURCE_TO_TEMP_OPEN_PIT. took: 1790ms. [2022-01-14T12:18:01.133+00:00][INFO ][savedobjects-service] [.kibana] REINDEX_SOURCE_TO_TEMP_OPEN_PIT -> REINDEX_SOURCE_TO_TEMP_READ. took: 12ms. [2022-01-14T12:18:01.210+00:00][INFO ][savedobjects-service] [.kibana] Starting to process 68 documents. [2022-01-14T12:18:01.210+00:00][INFO ][savedobjects-service] [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 77ms. [2022-01-14T12:18:01.246+00:00][INFO ][savedobjects-service] [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 36ms. [2022-01-14T12:18:01.346+00:00][INFO ][savedobjects-service] [.kibana_task_manager] CREATE_REINDEX_TEMP -> REINDEX_SOURCE_TO_TEMP_OPEN_PIT. took: 1942ms. [2022-01-14T12:18:01.351+00:00][INFO ][savedobjects-service] [.kibana_task_manager] REINDEX_SOURCE_TO_TEMP_OPEN_PIT -> REINDEX_SOURCE_TO_TEMP_READ. took: 5ms. [2022-01-14T12:18:01.365+00:00][INFO ][savedobjects-service] [.kibana_task_manager] Starting to process 18 documents. [2022-01-14T12:18:01.365+00:00][INFO ][savedobjects-service] [.kibana_task_manager] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 14ms. [2022-01-14T12:18:01.370+00:00][INFO ][savedobjects-service] [.kibana_task_manager] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 5ms. [2022-01-14T12:18:01.684+00:00][INFO ][savedobjects-service] [.kibana_task_manager] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 314ms. [2022-01-14T12:18:01.694+00:00][INFO ][savedobjects-service] [.kibana_task_manager] Processed 18 documents out of 18. [2022-01-14T12:18:01.694+00:00][INFO ][savedobjects-service] [.kibana_task_manager] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 10ms. [2022-01-14T12:18:01.703+00:00][INFO ][savedobjects-service] [.kibana_task_manager] REINDEX_SOURCE_TO_TEMP_CLOSE_PIT -> SET_TEMP_WRITE_BLOCK. took: 9ms. [2022-01-14T12:18:02.426+00:00][INFO ][savedobjects-service] [.kibana_task_manager] SET_TEMP_WRITE_BLOCK -> CLONE_TEMP_TO_TARGET. took: 723ms. [2022-01-14T12:18:15.007+00:00][INFO ][savedobjects-service] [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 13761ms. [2022-01-14T12:18:15.016+00:00][INFO ][savedobjects-service] [.kibana] Processed 68 documents out of 68. [2022-01-14T12:18:15.016+00:00][INFO ][savedobjects-service] [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 9ms. [2022-01-14T12:18:15.019+00:00][INFO ][savedobjects-service] [.kibana] REINDEX_SOURCE_TO_TEMP_CLOSE_PIT -> SET_TEMP_WRITE_BLOCK. took: 3ms. [2022-01-14T12:18:15.335+00:00][INFO ][savedobjects-service] [.kibana] SET_TEMP_WRITE_BLOCK -> CLONE_TEMP_TO_TARGET. took: 316ms. [2022-01-14T12:18:32.429+00:00][ERROR][savedobjects-service] [.kibana_task_manager] Action failed with 'Request timed out'. Retrying attempt 1 in 2 seconds. [2022-01-14T12:18:32.429+00:00][INFO ][savedobjects-service] [.kibana_task_manager] CLONE_TEMP_TO_TARGET -> CLONE_TEMP_TO_TARGET. took: 30003ms. [2022-01-14T12:18:45.338+00:00][ERROR][savedobjects-service] [.kibana] Action failed with 'Request timed out'. Retrying attempt 1 in 2 seconds. [2022-01-14T12:18:45.338+00:00][INFO ][savedobjects-service] [.kibana] CLONE_TEMP_TO_TARGET -> CLONE_TEMP_TO_TARGET. took: 30003ms. [2022-01-14T12:20:34.443+00:00][ERROR][savedobjects-service] [.kibana_task_manager] Action failed with 'Request timed out'. Retrying attempt 2 in 4 seconds. [2022-01-14T12:20:34.443+00:00][INFO ][savedobjects-service] [.kibana_task_manager] CLONE_TEMP_TO_TARGET -> CLONE_TEMP_TO_TARGET. took: 122014ms. ```

These are the Elasticsearch logs at the moment of starting Kibana (notice that no Error is logged when Kibana tries to create the index)

ES logs ``` [2022-01-14T12:17:58,794][INFO ][o.e.c.m.MetadataIndexStateService] [SebastiensMBP2.lan] adding block write to indices [[.kibana_7.17.0_001/WTAgWIhJSHu_gb3BG1nqNg]] [2022-01-14T12:17:58,919][INFO ][o.e.c.m.MetadataIndexStateService] [SebastiensMBP2.lan] adding block write to indices [[.kibana_task_manager_7.17.0_001/RrgJ-tpqQTW79380BBZBlQ]] [2022-01-14T12:17:59,125][INFO ][o.e.c.m.MetadataIndexStateService] [SebastiensMBP2.lan] completed adding block write to indices [.kibana_7.17.0_001] [2022-01-14T12:17:59,286][INFO ][o.e.c.m.MetadataIndexStateService] [SebastiensMBP2.lan] completed adding block write to indices [.kibana_task_manager_7.17.0_001] [2022-01-14T12:17:59,405][INFO ][o.e.c.m.MetadataCreateIndexService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp] creating index, cause [api], templates [], shards [1]/[1] [2022-01-14T12:17:59,413][INFO ][o.e.c.r.a.AllocationService] [SebastiensMBP2.lan] updating number_of_replicas to [0] for indices [.kibana_8.0.0_reindex_temp] [2022-01-14T12:17:59,699][INFO ][o.e.c.m.MetadataCreateIndexService] [SebastiensMBP2.lan] [.kibana_task_manager_8.0.0_reindex_temp] creating index, cause [api], templates [], shards [1]/[1] [2022-01-14T12:17:59,700][INFO ][o.e.c.r.a.AllocationService] [SebastiensMBP2.lan] updating number_of_replicas to [0] for indices [.kibana_task_manager_8.0.0_reindex_temp] [2022-01-14T12:18:01,142][INFO ][o.e.c.r.a.AllocationService] [SebastiensMBP2.lan] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana_task_manager_8.0.0_reindex_temp][0]]])." previous.health="YELLOW" reason="shards started [[.kibana_task_manager_8.0.0_reindex_temp][0]]" [2022-01-14T12:18:01,387][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:01,391][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_task_manager_8.0.0_reindex_temp/L6pZIxXgTeWOrXYYQuDOKw] update_mapping [_doc] [2022-01-14T12:18:01,545][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:01,909][INFO ][o.e.c.m.MetadataIndexStateService] [SebastiensMBP2.lan] adding block write to indices [[.kibana_task_manager_8.0.0_reindex_temp/L6pZIxXgTeWOrXYYQuDOKw]] [2022-01-14T12:18:02,152][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:02,276][INFO ][o.e.c.m.MetadataIndexStateService] [SebastiensMBP2.lan] completed adding block write to indices [.kibana_task_manager_8.0.0_reindex_temp] [2022-01-14T12:18:02,438][INFO ][o.e.c.m.MetadataCreateIndexService] [SebastiensMBP2.lan] applying create index request using existing index [.kibana_task_manager_8.0.0_reindex_temp] metadata [2022-01-14T12:18:02,442][INFO ][o.e.c.m.MetadataCreateIndexService] [SebastiensMBP2.lan] [.kibana_task_manager_8.0.0_001] creating index, cause [clone_index], templates [], shards [1]/[1] [2022-01-14T12:18:02,444][INFO ][o.e.c.r.a.AllocationService] [SebastiensMBP2.lan] updating number_of_replicas to [0] for indices [.kibana_task_manager_8.0.0_001] [2022-01-14T12:18:02,446][INFO ][o.e.c.r.a.AllocationService] [SebastiensMBP2.lan] current.health="RED" message="Cluster health status changed from [YELLOW] to [RED] (reason: [index [.kibana_task_manager_8.0.0_001] created])." previous.health="YELLOW" reason="index [.kibana_task_manager_8.0.0_001] created" [2022-01-14T12:18:02,555][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:02,667][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:02,810][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:02,952][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:03,080][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:05,768][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:08,447][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:09,900][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:11,977][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:13,293][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:14,402][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:14,790][INFO ][o.e.c.m.MetadataMappingService] [SebastiensMBP2.lan] [.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA] update_mapping [_doc] [2022-01-14T12:18:15,021][INFO ][o.e.c.m.MetadataIndexStateService] [SebastiensMBP2.lan] adding block write to indices [[.kibana_8.0.0_reindex_temp/XmBq6xDmQVCQCY1YtwTBgA]] [2022-01-14T12:18:15,185][INFO ][o.e.c.m.MetadataIndexStateService] [SebastiensMBP2.lan] completed adding block write to indices [.kibana_8.0.0_reindex_temp] [2022-01-14T12:18:15,341][INFO ][o.e.c.m.MetadataCreateIndexService] [SebastiensMBP2.lan] applying create index request using existing index [.kibana_8.0.0_reindex_temp] metadata [2022-01-14T12:18:15,344][INFO ][o.e.c.m.MetadataCreateIndexService] [SebastiensMBP2.lan] [.kibana_8.0.0_001] creating index, cause [clone_index], templates [], shards [1]/[1] [2022-01-14T12:18:15,345][INFO ][o.e.c.r.a.AllocationService] [SebastiensMBP2.lan] updating number_of_replicas to [0] for indices [.kibana_8.0.0_001] Colla ```

Talking with @rudolf he mentioned that "_This migration fails on the CLONE_TEMP_TOTARGET it could be because the cloned index never becomes yellow".

Running the GET _cluster/allocation/explain returns

{
  "note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
  "index" : ".kibana_task_manager_8.0.0_001",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2022-01-14T12:18:02.443Z",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "XdDz9AgETv6_RcnTcByzuQ",
      "node_name" : "SebastiensMBP2.lan",
      "transport_address" : "127.0.0.1:9300",
      "node_attributes" : {
        "ml.machine_memory" : "34359738368",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "17179869184"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [10.09703712696148%]"
        }
      ]
    }
  ]
}

We understand here why Kibana could not upgrade: "the node is above the low watermark cluster setting".

It would be good to surface that information in the Elasticsearch Logs whenever an index can not be created because of insufficient disk space. There is an issue opened in Kibana to better surface the upgrade failure but I think ES could also improve its logs.

elasticmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)