elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.78k stars 8.19k forks source link

Kibana stays read only when ES high disk watermark has been exceeded and later gone beneath the limit #13685

Closed algestam closed 6 years ago

algestam commented 7 years ago

Kibana version: 6.0.0-beta1

Elasticsearch version: 6.0.0-beta1

Server OS version: Ubuntu 16.04.2 LTS

Browser version: Chrome 60.0.3112.90

Browser OS version: Windows 10

Original install method (e.g. download page, yum, from source, etc.): Official tar.gz packages

Description of the problem including expected versus actual behavior:

I'm running a single node Elasticsearch instance, logstash and Kibana. Everything runs on the same host in separate docker containers.

If the high disk watermark is exceeded on the ES host, the following is logged in the elasticsearch log:

[2017-08-24T07:45:11,757][INFO ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-08-24T07:45:41,760][WARN ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] flood stage disk watermark [95%] exceeded on [CSOifArqQK-7PBZM_keNoA][CSOifAr][/data/elasticsearch/nodes/0] free: 693.8mb[2.1%], all indice
s on this node will marked read-only

When this has occured, changes to the .kibana index will of course fail as the index cannot be written to. This can be observed by trying to change any setting under Management->Advanced Settings where a change to i.e. search:queryLanguage fails with the message Config: Error 403 Forbidden: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

index_read_only

If more disk space now is made available, ES will log that the node has gone under the high watermark:

[2017-08-24T07:47:11,774][INFO ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] rerouting shards: [one or more nodes has gone under the high or low watermark]

One would now assume that it would be possible to make changes to Kibana settings but trying to make a settings change still fails with the error message:

Config: Error 403 Forbidden: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

Steps to reproduce:

  1. Make sure that setting changes can be performed without errors
  2. Fill up the elasticsearch data disk so that the high disk watermark is exceeded (I used fallocate -l9G largefile)
  3. Verify in the ES log that the high disk watermark has been exceeded and the indices has been marked read-only
  4. Perform a setting change and verify that it fails since writes are prohibited
  5. Resolve the high disk watermark condition (which I did with rm largefile)
  6. Verify that the ES log states that the node has gone under the high disk watermark (and thus should be possible to write to?)
  7. Perform a setting change and it will fail when it actually should succeed.
scaarup commented 7 years ago

So how do I recover from this? .kibana stays in read only no matter what I do. I have tried to snapshot it, delete it and recover it from snapshot - still read only...

darkpixel commented 6 years ago

I just ran into this on a test machine. For the life of me I can't continue putting data in to the cluster. I finally had to blow away all the involved indices.

sz3n commented 6 years ago

i resolved the issue by deleting the .kibana index: delete /.kibana/ I lose certains configurations/visualizations/dashboards but it dislocked.

xose commented 6 years ago

I just got hit by this. It's not just Kibana, all indexes get locked when the disk threshold is reached and never get unlocked when space is freed.

To unlock all indexes manually:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

algestam commented 6 years ago

Thanks @xose, I just got hit by this again and was able to recover by using the command you suggested :)

The problem occurred on all indices, not just the .kibana one.

According to the ES logs, the indices was set to read-only due to low disk space on the elasticsearch host. I run a single host with Elasticsearch, Kibana, Logstash dockerized together with some other tools. As this problem affects other indices is think this is more of an Elasticsearch problem and that the problem seen in Kibana is a symptom of another issue.

saberkun commented 6 years ago

This bug is stupid. Can you Unbreak it for now? At least you should display a warning and list a possible solution. It is really stupid for me to look into js error log and find this thread!

darkpixel commented 6 years ago

@saberkun You can unbreak it by following the command @xose posted:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
saberkun commented 6 years ago

Yes, I did.

On Sun, Nov 26, 2017 at 11:12 PM Aaron C. de Bruyn notifications@github.com wrote:

@saberkun https://github.com/saberkun You can unbreak it by following the command @xose https://github.com/xose posted:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/kibana/issues/13685#issuecomment-347074533, or mute the thread https://github.com/notifications/unsubscribe-auth/AEpb5RJrhqJ8fK9wxGtNvTZtomMtlqzZks5s6jbBgaJpZM4PBHOW .

darkpixel commented 6 years ago

Can you provide additional information? Did you receive an error when running the command? Did the indices unlock and now you're getting a new error message? What error messages are you seeing in your log files now?

saberkun commented 6 years ago

Thanks. It is fixed by the command. I mean yes, I used it to fix the problem

On Sun, Nov 26, 2017 at 11:19 PM Aaron C. de Bruyn notifications@github.com wrote:

Can you provide additional information? Did you receive an error when running the command? Did the indices unlock and now you're getting a new error message? What error messages are you seeing in your log files now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/kibana/issues/13685#issuecomment-347075205, or mute the thread https://github.com/notifications/unsubscribe-auth/AEpb5Xn5uJBlzvAyXkAjRPom-OiwJ43Gks5s6jg0gaJpZM4PBHOW .

kesha-antonov commented 6 years ago

+1 Receiving this error after upgrade from 5.5 to 6.0

purplesrl commented 6 years ago

+1

ELK 6, cleared half the drive still read-only, logstash is allowed to write again, kibana remained read-only

Managed to solve the issue with the workaround provided by @xose

harmenverburg commented 6 years ago

+1, same error for me.

sangeetawakhale commented 6 years ago

Same issue for me. Got resolved by solution given by @xose.

patodevilla commented 6 years ago

Same here. All hail @xose.

darkpixel commented 6 years ago

I just upgraded a single-node cluster from 6.0.0 to 6.1.1 (both ES and Kibana). When I started the services back up, Kibana was throwing:

blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

Same as last time--I had to delete the .kibana index to get it back up and going. There was also the current logstash index with one of the shards listed as unallocated. I deleted it as well and then got the usual flood of alerts in.

I didn't run out of space--there's ~92 GB out of 120 GB free on this test machine. The storage location is ZFS and a scrub didn't reveal any data corruption.

The only errors in the log appear to be irrelevant:

[2018-01-13T20:48:14,579][INFO ][o.e.n.Node               ] [ripley1] stopping ...
[2018-01-13T20:48:14,597][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
        at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:821) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:327) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:320) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:746) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:760) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:428) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:113) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:87) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1010) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:825) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1027) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:301) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at org.elasticsearch.http.netty4.Netty4HttpChannel.sendResponse(Netty4HttpChannel.java:146) [transport-netty4-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:491) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:380) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onFailure(TransportBulkAction.java:375) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:91) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishAsFailed(TransportReplicationAction.java:908) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:891) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onClusterServiceClose(ClusterStateObserver.java:310) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:230) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.service.ClusterApplierService.doStop(ClusterApplierService.java:168) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.service.ClusterService.doStop(ClusterService.java:106) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.node.Node.stop(Node.java:713) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.node.Node.close(Node.java:735) [elasticsearch-6.0.0.jar:6.0.0]
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:89) [lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35]
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:76) [lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35]
        at org.elasticsearch.bootstrap.Bootstrap$4.run(Bootstrap.java:185) [elasticsearch-6.0.0.jar:6.0.0]
[2018-01-13T20:48:14,692][INFO ][o.e.n.Node               ] [ripley1] stopped
[2018-01-13T20:48:14,692][INFO ][o.e.n.Node               ] [ripley1] closing ...
[2018-01-13T20:48:14,704][INFO ][o.e.n.Node               ] [ripley1] closed
[2018-01-13T20:48:39,879][INFO ][o.e.n.Node               ] [ripley1] initializing ...
[2018-01-13T20:48:40,054][INFO ][o.e.e.NodeEnvironment    ] [ripley1] using [1] data paths, mounts [[/scratch/elasticsearch (scratch/elasticsearch)]], net usable_space [92.5gb], net total_space [93.6gb], types [zfs]
[2018-01-13T20:48:40,055][INFO ][o.e.e.NodeEnvironment    ] [ripley1] heap size [989.8mb], compressed ordinary object pointers [true]
[2018-01-13T20:48:40,119][INFO ][o.e.n.Node               ] [ripley1] node name [ripley1], node ID [TvkaGbQpR5KZ-ZScMZN6AQ]
[2018-01-13T20:48:40,119][INFO ][o.e.n.Node               ] [ripley1] version[6.1.1], pid[6942], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/4.10.0-38-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2018-01-13T20:48:40,120][INFO ][o.e.n.Node               ] [ripley1] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [aggs-matrix-stats]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [analysis-common]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [ingest-common]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-expression]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-mustache]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-painless]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [mapper-extras]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [parent-join]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [percolator]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [reindex]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [repository-url]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [transport-netty4]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [tribe]
[2018-01-13T20:48:41,321][INFO ][o.e.p.PluginsService     ] [ripley1] no plugins loaded
[2018-01-13T20:48:43,801][INFO ][o.e.d.DiscoveryModule    ] [ripley1] using discovery type [zen]
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] initialized
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] starting ...
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] starting ...
[2018-01-13T20:48:44,759][INFO ][o.e.t.TransportService   ] [ripley1] publish_address {192.168.42.40:9300}, bound_addresses {[::]:9300}
[2018-01-13T20:48:44,792][INFO ][o.e.b.BootstrapChecks    ] [ripley1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-13T20:48:47,864][INFO ][o.e.c.s.MasterService    ] [ripley1] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300}
[2018-01-13T20:48:47,869][INFO ][o.e.c.s.ClusterApplierService] [ripley1] new_master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300}, reason: apply cluster state (from master [master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-13T20:48:47,884][INFO ][o.e.h.n.Netty4HttpServerTransport] [ripley1] publish_address {192.168.42.40:9200}, bound_addresses {[::]:9200}
[2018-01-13T20:48:47,884][INFO ][o.e.n.Node               ] [ripley1] started
[2018-01-13T20:48:48,326][INFO ][o.e.g.GatewayService     ] [ripley1] recovered [6] indices into cluster_state
[2018-01-13T20:49:01,493][INFO ][o.e.c.m.MetaDataDeleteIndexService] [ripley1] [logstash-2018.01.14/D0f_lDkSQpebPFcey6NHFw] deleting index
[2018-01-13T20:49:18,793][INFO ][o.e.c.m.MetaDataCreateIndexService] [ripley1] [logstash-2018.01.14] creating index, cause [auto(bulk api)], templates [logstash-*], shards [5]/[0], mappings []
[2018-01-13T20:49:18,937][INFO ][o.e.c.r.a.AllocationService] [ripley1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2018.01.14][4]] ...]).
zjhgx commented 6 years ago

+1 same error in 6.1.2

tylersmalley commented 6 years ago

This is a function of Elasticsearch. Per the Elasticsearch error, all indices on this node will marked read-only.

To revert this for an index you can set index.blocks.read_only_allow_delete to null.

More information on this can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html

darkpixel commented 6 years ago

FYI - for anyone still running into this, here's a quick one-liner to fix the indices: curl -s -H "Content-Type: application/json" http://localhost:9200/_cat/indices | awk '{ print $3 }' | sort | xargs -L 1 -I{} curl -s -XPUT -H "Content-Type: application/json" http://localhost:9200/{}/_settings -d '{"index.blocks.read_only_allow_delete": null}'

It grabs a list of all the indices in your cluster, then for each one it sends the command to make it not read-only.

outworlder commented 6 years ago

FYI - for anyone still running into this, here's a quick one-liner to fix the indices: curl -s -H "Content-Type: application/json" http://localhost:9200/_cat/indices | awk '{ print $3 }' | sort | xargs -L 1 -I{} curl -s -XPUT -H "Content-Type: application/json" http://localhost:9200/{}/_settings -d '{"index.blocks.read_only_allow_delete": null}'

It grabs a list of all the indices in your cluster, then for each one it sends the command to make it not read-only.

I too was doing this until I found @darkpixel 's solution (https://github.com/elastic/kibana/issues/13685#issuecomment-347074533)

You can do this setting for _all instead of going one by one. In my case, it takes quite a while to do it for hundreds of indices, while setting on 'all' takes only a few seconds.

curl -XPUT -H "Content-Type: application/json" https://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

Frank591 commented 5 years ago

i resolved the issue by deleting the .kibana index: delete /.kibana/ I lose certains configurations/visualizations/dashboards but it dislocked.

Thanks a lot for this WA. It's solved problem for me.

Goahnary commented 5 years ago

This worked for me. Both commands were needed to get kabana working after a fresh install:

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_cluster/settings -d '{ "transient": { "cluster.routing.allocation.disk.threshold_enabled": false } }'
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

This did not require deleting the .kibana index. Works perfectly now!

Source: https://selleo.com/til/posts/esrgfyxjee-how-to-fix-elasticsearch-forbidden12index-read-only