Graylog2 / docker-compose

A set of Docker Compose files that allow you to quickly spin up a Graylog instance for testing or demo purposes.
Apache License 2.0
357 stars 134 forks source link

datanode won't restart after hitting flood-stage watermark #57

Closed j3k0 closed 3 months ago

j3k0 commented 5 months ago

My computer went low on disk, hitting opensearch's flood-stage watermark, so opensearch set all indices to read only.

Now (after freeing up disk space) graylog-datanode will still not restart:

WARN ClusterBlockException[index [.opensearch-observability] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]
...
WARN  [OpensearchNodeHeartbeat] Opensearch REST api of process 679 unavailable. Cause: Unable to parse response body
WARN  [OpensearchProcessImpl] Opensearch process failed

Problem is that I need opensearch to be up in order to reset the read-only status of the indice (it's done with the REST api).

Since the startup script kills opensearch pretty much immediately I don't have time to proceed.

Any idea?

janheise commented 5 months ago

@j3k0 Hi, sorry to hear from your problem. Can you add more from the logs? The snippet is not enough for me to deduct where it fails and how to help you.

Also, which setup do you run?

j3k0 commented 5 months ago

docker-compose.yml (section about datanode only):

  datanode:
    image: "graylog/graylog-datanode:5.2"
    hostname: "2a9e851d2339"
    environment:
      GRAYLOG_DATANODE_NODE_ID_FILE: "/var/lib/graylog-datanode/node-id"
      GRAYLOG_DATANODE_PASSWORD_SECRET: "xxx"
      GRAYLOG_DATANODE_ROOT_PASSWORD_SHA2: "xxx"
      GRAYLOG_DATANODE_MONGODB_URI: "mongodb://mongodb:27017/graylog"
    ulimits:
      memlock:
        hard: -1
        soft: -1
      nofile:
        soft: 65536
        hard: 65536
    ports:
      - "192.168.96.2:8999:8999/tcp"   # DataNode API
      - "192.168.96.2:9200:9200/tcp"
      - "192.168.96.2:9300:9300/tcp"
    volumes:
      - "/opt/graylog/data/graylog-datanode:/var/lib/graylog-datanode"
    restart: "on-failure"

Logs:

[2024-01-12T08:42:43,368][INFO ][o.o.n.Node               ] [2a9e851d2339] version[2.10.0], pid[1704], build[tar/eee49cb340edc6c4d489bcd9324dda571fc8dc03/2023-09-20T23:54:29.889267151Z], OS[Linux/5.4.0-144-generic/amd64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.8/17.0.8+7]
[2024-01-12T08:42:43,370][INFO ][o.o.n.Node               ] [2a9e851d2339] JVM home [/usr/share/graylog-datanode/dist/opensearch-2.10.0-linux-x64/jdk], using bundled JDK/JRE [false]
[2024-01-12T08:42:43,371][INFO ][o.o.n.Node               ] [2a9e851d2339] JVM arguments [-Xshare:auto, -Dopensearch.networkaddress.cache.ttl=60, -Dopensearch.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/opensearch-15811002672978984410, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=/tmp/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/tmp/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -XX:MaxDirectMemorySize=536870912, -Dopensearch.path.home=/usr/share/graylog-datanode/dist/opensearch-2.10.0-linux-x64, -Dopensearch.path.conf=/var/lib/graylog-datanode/opensearch/config/a1089a07-d4ce-481c-82a8-b1e636b80abf/opensearch, -Dopensearch.distribution.type=tar, -Dopensearch.bundled_jdk=true]
[2024-01-12T08:42:44,209][INFO ][o.o.s.s.t.SSLConfig      ] [2a9e851d2339] SSL dual mode is disabled
[2024-01-12T08:42:44,209][INFO ][o.o.s.OpenSearchSecurityPlugin] [2a9e851d2339] OpenSearch Config path is /var/lib/graylog-datanode/opensearch/config/a1089a07-d4ce-481c-82a8-b1e636b80abf/opensearch
[2024-01-12T08:42:44,414][INFO ][o.o.s.s.DefaultSecurityKeyStore] [2a9e851d2339] JVM supports TLSv1.3
[2024-01-12T08:42:44,416][INFO ][o.o.s.s.DefaultSecurityKeyStore] [2a9e851d2339] Config directory is /var/lib/graylog-datanode/opensearch/config/a1089a07-d4ce-481c-82a8-b1e636b80abf/opensearch/, from there the key- and truststore files are resolved relatively
[2024-01-12T08:42:44,418][WARN ][o.o.s.s.SecureSSLSettings] [2a9e851d2339] Setting [plugins.security.ssl.transport.keystore_password] has a secure counterpart [plugins.security.ssl.transport.keystore_password_secure] which should be used instead - allowing for legacy SSL setups
[2024-01-12T08:42:44,426][WARN ][o.o.s.s.SecureSSLSettings] [2a9e851d2339] Setting [plugins.security.ssl.transport.truststore_password] has a secure counterpart [plugins.security.ssl.transport.truststore_password_secure] which should be used instead - allowing for legacy SSL setups
[2024-01-12T08:42:44,562][WARN ][o.o.s.s.u.SSLCertificateHelper] [2a9e851d2339] Certificate chain for alias datanode contains a root certificate
[2024-01-12T08:42:44,714][WARN ][o.o.s.s.SecureSSLSettings] [2a9e851d2339] Setting [plugins.security.ssl.http.keystore_password] has a secure counterpart [plugins.security.ssl.http.keystore_password_secure] which should be used instead - allowing for legacy SSL setups
[2024-01-12T08:42:44,715][INFO ][o.o.s.s.DefaultSecurityKeyStore] [2a9e851d2339] HTTPS client auth mode OPTIONAL
[2024-01-12T08:42:44,743][WARN ][o.o.s.s.u.SSLCertificateHelper] [2a9e851d2339] Certificate chain for alias datanode contains a root certificate
[2024-01-12T08:42:44,757][WARN ][o.o.s.s.SecureSSLSettings] [2a9e851d2339] Setting [plugins.security.ssl.http.truststore_password] has a secure counterpart [plugins.security.ssl.http.truststore_password_secure] which should be used instead - allowing for legacy SSL setups
[2024-01-12T08:42:44,805][INFO ][o.o.s.s.DefaultSecurityKeyStore] [2a9e851d2339] TLS Transport Client Provider : JDK
[2024-01-12T08:42:44,805][INFO ][o.o.s.s.DefaultSecurityKeyStore] [2a9e851d2339] TLS Transport Server Provider : JDK
[2024-01-12T08:42:44,805][INFO ][o.o.s.s.DefaultSecurityKeyStore] [2a9e851d2339] TLS HTTP Provider             : JDK
[2024-01-12T08:42:44,806][INFO ][o.o.s.s.DefaultSecurityKeyStore] [2a9e851d2339] Enabled TLS protocols for transport layer : [TLSv1.3, TLSv1.2]
[2024-01-12T08:42:44,806][INFO ][o.o.s.s.DefaultSecurityKeyStore] [2a9e851d2339] Enabled TLS protocols for HTTP layer      : [TLSv1.3, TLSv1.2]
[2024-01-12T08:42:45,001][INFO ][o.o.s.OpenSearchSecurityPlugin] [2a9e851d2339] Clustername: datanode-cluster
[2024-01-12T08:42:45,366][INFO ][o.o.i.r.ReindexPlugin    ] [2a9e851d2339] ReindexPlugin reloadSPI called
[2024-01-12T08:42:45,367][INFO ][o.o.i.r.ReindexPlugin    ] [2a9e851d2339] Unable to find any implementation for RemoteReindexExtension
[2024-01-12T08:42:45,377][INFO ][o.o.j.JobSchedulerPlugin ] [2a9e851d2339] Loaded scheduler extension: opendistro_anomaly_detector, index: .opendistro-anomaly-detector-jobs
[2024-01-12T08:42:45,380][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [aggs-matrix-stats]
[2024-01-12T08:42:45,380][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [analysis-common]
[2024-01-12T08:42:45,380][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [geo]
[2024-01-12T08:42:45,380][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [ingest-common]
[2024-01-12T08:42:45,381][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [ingest-geoip]
[2024-01-12T08:42:45,381][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [ingest-user-agent]
[2024-01-12T08:42:45,381][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [lang-expression]
[2024-01-12T08:42:45,381][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [lang-mustache]
[2024-01-12T08:42:45,382][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [lang-painless]
[2024-01-12T08:42:45,382][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [mapper-extras]
[2024-01-12T08:42:45,382][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [opensearch-dashboards]
[2024-01-12T08:42:45,382][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [parent-join]
[2024-01-12T08:42:45,383][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [percolator]
[2024-01-12T08:42:45,383][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [rank-eval]
[2024-01-12T08:42:45,383][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [reindex]
[2024-01-12T08:42:45,383][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [repository-url]
[2024-01-12T08:42:45,384][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [search-pipeline-common]
[2024-01-12T08:42:45,384][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [systemd]
[2024-01-12T08:42:45,384][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded module [transport-netty4]
[2024-01-12T08:42:45,385][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded plugin [opensearch-anomaly-detection]
[2024-01-12T08:42:45,385][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded plugin [opensearch-asynchronous-search]
[2024-01-12T08:42:45,385][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded plugin [opensearch-cross-cluster-replication]
[2024-01-12T08:42:45,385][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded plugin [opensearch-job-scheduler]
[2024-01-12T08:42:45,385][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded plugin [opensearch-ml]
[2024-01-12T08:42:45,386][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded plugin [opensearch-observability]
[2024-01-12T08:42:45,386][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] loaded plugin [opensearch-security]
[2024-01-12T08:42:45,400][INFO ][o.o.s.OpenSearchSecurityPlugin] [2a9e851d2339] Disabled https compression by default to mitigate BREACH attacks. You can enable it by setting 'http.compression: true' in opensearch.yml
[2024-01-12T08:42:45,403][INFO ][o.o.e.ExtensionsManager  ] [2a9e851d2339] ExtensionsManager initialized
[2024-01-12T08:42:45,422][INFO ][o.o.e.NodeEnvironment    ] [2a9e851d2339] using [1] data paths, mounts [[/var/lib/graylog-datanode (/dev/sda1)]], net usable_space [28.9gb], net total_space [225gb], types [ext4]
[2024-01-12T08:42:45,423][INFO ][o.o.e.NodeEnvironment    ] [2a9e851d2339] heap size [1gb], compressed ordinary object pointers [true]
[2024-01-12T08:42:45,512][INFO ][o.o.n.Node               ] [2a9e851d2339] node name [2a9e851d2339], node ID [kazzT8cHQrCNzvnQTA0OVA], cluster name [datanode-cluster], roles [ingest, remote_cluster_client, data, cluster_manager]
[2024-01-12T08:42:47,138][WARN ][o.o.s.s.SecureSSLSettings] [2a9e851d2339] Setting [plugins.security.ssl.http.truststore_password] has a secure counterpart [plugins.security.ssl.http.truststore_password_secure] which should be used instead - allowing for legacy SSL setups
[2024-01-12T08:42:47,140][WARN ][o.o.s.s.SecureSSLSettings] [2a9e851d2339] Setting [plugins.security.ssl.transport.truststore_password] has a secure counterpart [plugins.security.ssl.transport.truststore_password_secure] which should be used instead - allowing for legacy SSL setups
[2024-01-12T08:42:47,143][WARN ][o.o.s.s.SecureSSLSettings] [2a9e851d2339] Setting [plugins.security.ssl.http.keystore_password] has a secure counterpart [plugins.security.ssl.http.keystore_password_secure] which should be used instead - allowing for legacy SSL setups
[2024-01-12T08:42:47,143][WARN ][o.o.s.s.SecureSSLSettings] [2a9e851d2339] Setting [plugins.security.ssl.transport.keystore_password] has a secure counterpart [plugins.security.ssl.transport.keystore_password_secure] which should be used instead - allowing for legacy SSL setups
[2024-01-12T08:42:47,941][WARN ][o.o.s.c.Salt             ] [2a9e851d2339] If you plan to use field masking pls configure compliance salt e1ukloTsQlOgPquJ to be a random string of 16 chars length identical on all nodes
[2024-01-12T08:42:47,972][ERROR][o.o.s.a.s.SinkProvider   ] [2a9e851d2339] Default endpoint could not be created, auditlog will not work properly.
[2024-01-12T08:42:47,973][WARN ][o.o.s.a.r.AuditMessageRouter] [2a9e851d2339] No default storage available, audit log may not work properly. Please check configuration.
[2024-01-12T08:42:47,973][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Message routing enabled: false
[2024-01-12T08:42:48,006][INFO ][o.o.s.f.SecurityFilter   ] [2a9e851d2339] <NONE> indices are made immutable.
[2024-01-12T08:42:48,143][INFO ][o.o.m.b.MLCircuitBreakerService] [2a9e851d2339] Registered ML memory breaker.
[2024-01-12T08:42:48,143][INFO ][o.o.m.b.MLCircuitBreakerService] [2a9e851d2339] Registered ML disk breaker.
[2024-01-12T08:42:48,144][INFO ][o.o.m.b.MLCircuitBreakerService] [2a9e851d2339] Registered ML native memory breaker.
[2024-01-12T08:42:48,246][INFO ][o.r.Reflections          ] [2a9e851d2339] Reflections took 40 ms to scan 1 urls, producing 17 keys and 43 values 
[2024-01-12T08:42:48,445][INFO ][o.o.a.b.ADCircuitBreakerService] [2a9e851d2339] Registered memory breaker.
[2024-01-12T08:42:48,861][INFO ][o.o.t.NettyAllocator     ] [2a9e851d2339] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=1gb}]
[2024-01-12T08:42:48,976][INFO ][o.o.d.DiscoveryModule    ] [2a9e851d2339] using discovery type [zen] and seed hosts providers [settings, file]
[2024-01-12T08:42:49,289][WARN ][o.o.g.DanglingIndicesState] [2a9e851d2339] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2024-01-12T08:42:49,576][INFO ][o.o.n.Node               ] [2a9e851d2339] initialized
[2024-01-12T08:42:49,577][INFO ][o.o.n.Node               ] [2a9e851d2339] starting ...
[2024-01-12T08:42:49,673][INFO ][o.o.t.TransportService   ] [2a9e851d2339] publish_address {172.19.0.3:9300}, bound_addresses {0.0.0.0:9300}
[2024-01-12T08:42:49,675][INFO ][o.o.t.TransportService   ] [2a9e851d2339] Remote clusters initialized successfully.
[2024-01-12T08:42:49,854][INFO ][o.o.b.BootstrapChecks    ] [2a9e851d2339] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2024-01-12T08:42:49,859][INFO ][o.o.c.c.Coordinator      ] [2a9e851d2339] cluster UUID [-1_6XsUBRdu5oXpg1GlbVw]
[2024-01-12T08:42:49,920][INFO ][o.o.c.s.MasterService    ] [2a9e851d2339] elected-as-cluster-manager ([1] nodes joined)[{2a9e851d2339}{kazzT8cHQrCNzvnQTA0OVA}{dL1qO9sRT-S2obKPFgv2xA}{172.19.0.3}{172.19.0.3:9300}{dimr}{shard_indexing_pressure_enabled=true} elect leader, _BECOME_CLUSTER_MANAGER_TASK_, _FINISH_ELECTION_], term: 63, version: 1603, delta: cluster-manager node changed {previous [], current [{2a9e851d2339}{kazzT8cHQrCNzvnQTA0OVA}{dL1qO9sRT-S2obKPFgv2xA}{172.19.0.3}{172.19.0.3:9300}{dimr}{shard_indexing_pressure_enabled=true}]}
[2024-01-12T08:42:49,969][INFO ][o.o.c.s.ClusterApplierService] [2a9e851d2339] cluster-manager node changed {previous [], current [{2a9e851d2339}{kazzT8cHQrCNzvnQTA0OVA}{dL1qO9sRT-S2obKPFgv2xA}{172.19.0.3}{172.19.0.3:9300}{dimr}{shard_indexing_pressure_enabled=true}]}, term: 63, version: 1603, reason: Publication{term=63, version=1603}
[2024-01-12T08:42:49,978][INFO ][o.o.a.c.ADClusterEventListener] [2a9e851d2339] Cluster is not recovered yet.
[2024-01-12T08:42:49,986][INFO ][o.o.d.PeerFinder         ] [2a9e851d2339] setting findPeersInterval to [1s] as node commission status = [true] for local node [{2a9e851d2339}{kazzT8cHQrCNzvnQTA0OVA}{dL1qO9sRT-S2obKPFgv2xA}{172.19.0.3}{172.19.0.3:9300}{dimr}{shard_indexing_pressure_enabled=true}]
[2024-01-12T08:42:49,989][INFO ][o.o.h.AbstractHttpServerTransport] [2a9e851d2339] publish_address {172.19.0.3:9200}, bound_addresses {0.0.0.0:9200}
[2024-01-12T08:42:49,989][INFO ][o.o.n.Node               ] [2a9e851d2339] started
[2024-01-12T08:42:49,990][INFO ][o.o.s.OpenSearchSecurityPlugin] [2a9e851d2339] Node started
[2024-01-12T08:42:49,990][INFO ][o.o.s.c.ConfigurationRepository] [2a9e851d2339] Will attempt to create index .opendistro_security and default configs if they are absent
[2024-01-12T08:42:49,992][INFO ][o.o.s.OpenSearchSecurityPlugin] [2a9e851d2339] 0 OpenSearch Security modules loaded so far: []
[2024-01-12T08:42:49,992][INFO ][o.o.s.c.ConfigurationRepository] [2a9e851d2339] Background init thread started. Install default config?: true
[2024-01-12T08:42:49,993][INFO ][o.o.s.c.ConfigurationRepository] [2a9e851d2339] Wait for cluster to be available ...
[2024-01-12T08:42:50,053][INFO ][o.o.a.c.HashRing         ] [2a9e851d2339] Node added: [kazzT8cHQrCNzvnQTA0OVA]
[2024-01-12T08:42:50,056][INFO ][o.o.g.GatewayService     ] [2a9e851d2339] recovered [15] indices into cluster_state
[2024-01-12T08:42:50,057][INFO ][o.o.a.c.HashRing         ] [2a9e851d2339] Add data node to AD version hash ring: kazzT8cHQrCNzvnQTA0OVA
[2024-01-12T08:42:50,059][INFO ][o.o.a.c.HashRing         ] [2a9e851d2339] All nodes with known AD version: {kazzT8cHQrCNzvnQTA0OVA=ADNodeInfo{version=2.10.0, isEligibleDataNode=true}}
[2024-01-12T08:42:50,059][INFO ][o.o.a.c.HashRing         ] [2a9e851d2339] Rebuild AD hash ring for realtime AD with cooldown, nodeChangeEvents size 0
[2024-01-12T08:42:50,059][INFO ][o.o.a.c.HashRing         ] [2a9e851d2339] Build AD version hash ring successfully
[2024-01-12T08:42:50,060][INFO ][o.o.a.c.ADDataMigrator   ] [2a9e851d2339] Start migrating AD data
[2024-01-12T08:42:50,061][INFO ][o.o.a.c.ADDataMigrator   ] [2a9e851d2339] AD job index doesn't exist, no need to migrate
[2024-01-12T08:42:50,061][INFO ][o.o.a.c.ADClusterEventListener] [2a9e851d2339] Init AD version hash ring successfully
[2024-01-12T08:42:50,058][ERROR][o.o.b.Bootstrap          ] [2a9e851d2339] Exception
org.opensearch.cluster.block.ClusterBlockException: index [.opensearch-observability] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];
    at org.opensearch.cluster.block.ClusterBlocks.indicesBlockedException(ClusterBlocks.java:243) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.block.ClusterBlocks.indexBlockedException(ClusterBlocks.java:221) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.admin.indices.create.TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.java:95) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.admin.indices.create.TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.java:56) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.doStart(TransportClusterManagerNodeAction.java:237) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction$2.onNewClusterState(TransportClusterManagerNodeAction.java:336) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:380) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.ClusterStateObserver$ObserverClusterStateListener.clusterChanged(ClusterStateObserver.java:230) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:625) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:613) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:577) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) ~[opensearch-2.10.0.jar:2.10.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    at java.lang.Thread.run(Thread.java:833) ~[?:?]
[2024-01-12T08:42:50,066][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [2a9e851d2339] uncaught exception in thread [main]
org.opensearch.bootstrap.StartupException: ClusterBlockException[index [.opensearch-observability] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]
    at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:184) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:171) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138) ~[opensearch-cli-2.10.0.jar:2.10.0]
    at org.opensearch.cli.Command.main(Command.java:101) ~[opensearch-cli-2.10.0.jar:2.10.0]
    at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:137) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:103) ~[opensearch-2.10.0.jar:2.10.0]
Caused by: org.opensearch.cluster.block.ClusterBlockException: index [.opensearch-observability] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];
    at org.opensearch.cluster.block.ClusterBlocks.indicesBlockedException(ClusterBlocks.java:243) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.block.ClusterBlocks.indexBlockedException(ClusterBlocks.java:221) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.admin.indices.create.TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.java:95) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.admin.indices.create.TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.java:56) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.doStart(TransportClusterManagerNodeAction.java:237) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction$2.onNewClusterState(TransportClusterManagerNodeAction.java:336) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:380) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.ClusterStateObserver$ObserverClusterStateListener.clusterChanged(ClusterStateObserver.java:230) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:625) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:613) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:577) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) ~[opensearch-2.10.0.jar:2.10.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
[2024-01-12T08:42:50,115][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[gl-system-events_2/YUGm2c2BTZazvWAqfmGITw]
[2024-01-12T08:42:50,245][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[syslog_5/HXl2VkYPTjuzx2spYF9dbg]
[2024-01-12T08:42:50,266][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[syslog_6/bFtRPrJKRuiiyf97I6jW9w]
[2024-01-12T08:42:50,281][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[.opendistro_security/PfkvKfRKTXSNi_x9MDnz0Q]
[2024-01-12T08:42:50,473][ERROR][o.o.s.a.BackendRegistry  ] [2a9e851d2339] Not yet initialized (you may need to run securityadmin)
[2024-01-12T08:42:50,541][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[syslog_3/aOT9Ci9jS4ecEnKrUicnXg]
[2024-01-12T08:42:50,553][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[syslog_4/uH01vHGoSQKeiKat15l1SA]
[2024-01-12T08:42:50,563][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[syslog_2/k01FanjURzWLtlFzTXY_Yg]
[2024-01-12T08:42:50,645][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[gl-system-events_1/T30yfLHNS9qw-R5-ZrIlyQ]
[2024-01-12T08:42:50,654][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[syslog_1/zasFC9XiQBmORC4FEShCPg]
[2024-01-12T08:42:50,666][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[gl-system-events_0/1XbT6zZwS_uYYb28usmefQ]
[2024-01-12T08:42:50,675][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[syslog_0/gqMfwTvgQ4CGE0ts9NM5pw]
[2024-01-12T08:42:50,767][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[graylog_0/wME7Xj0OQamKfTK59Yd6yw]
[2024-01-12T08:42:50,805][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[gl-events_0/hmFTBLxoS5Suzigwupgf7A]
[2024-01-12T08:42:50,814][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[.opensearch-observability/SmSmSLIaTbut8nQW6iXk8Q]
[2024-01-12T08:42:50,820][INFO ][o.o.p.PluginsService     ] [2a9e851d2339] PluginService:onIndexModule index:[.plugins-ml-config/rgqRp-b-RQaUsn7I8vts9w]
[2024-01-12T08:42:50,995][ERROR][o.o.s.c.ConfigurationRepository] [2a9e851d2339] Cannot apply default config (this is maybe not an error!)
org.opensearch.cluster.block.ClusterBlockException: index [.opendistro_security] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];
    at org.opensearch.cluster.block.ClusterBlocks.indicesBlockedException(ClusterBlocks.java:243) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.cluster.block.ClusterBlocks.indexBlockedException(ClusterBlocks.java:221) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.admin.indices.create.TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.java:95) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.admin.indices.create.TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.java:56) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.doStart(TransportClusterManagerNodeAction.java:237) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.tryAction(TransportClusterManagerNodeAction.java:206) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.RetryableAction$1.doRun(RetryableAction.java:137) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.RetryableAction.run(RetryableAction.java:115) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.doExecute(TransportClusterManagerNodeAction.java:167) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.doExecute(TransportClusterManagerNodeAction.java:79) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:290) ~[?:?]
    at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:165) ~[?:?]
    at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:476) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:463) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1528) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.client.support.AbstractClient$IndicesAdmin.create(AbstractClient.java:1622) ~[opensearch-2.10.0.jar:2.10.0]
    at org.opensearch.security.configuration.ConfigurationRepository.createSecurityIndexIfAbsent(ConfigurationRepository.java:268) ~[opensearch-security-2.10.0.0.jar:2.10.0.0]
    at org.opensearch.security.configuration.ConfigurationRepository.lambda$new$0(ConfigurationRepository.java:143) [opensearch-security-2.10.0.0.jar:2.10.0.0]
    at java.lang.Thread.run(Thread.java:833) [?:?]
[2024-01-12T08:42:51,030][INFO ][o.o.c.r.a.AllocationService] [2a9e851d2339] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[graylog_0][0]]]).
[2024-01-12T08:42:51,178][INFO ][stdout                   ] [2a9e851d2339] [FINE] No subscribers registered for event class org.opensearch.security.securityconf.DynamicConfigFactory$NodesDnModelImpl
[2024-01-12T08:42:51,179][INFO ][stdout                   ] [2a9e851d2339] [FINE] No subscribers registered for event class org.greenrobot.eventbus.NoSubscriberEvent
[2024-01-12T08:42:51,179][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing on REST API is enabled.
[2024-01-12T08:42:51,179][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] [AUTHENTICATED, GRANTED_PRIVILEGES] are excluded from REST API auditing.
[2024-01-12T08:42:51,180][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing on Transport API is enabled.
[2024-01-12T08:42:51,180][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] [AUTHENTICATED, GRANTED_PRIVILEGES] are excluded from Transport API auditing.
[2024-01-12T08:42:51,180][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing of request body is enabled.
[2024-01-12T08:42:51,180][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Bulk requests resolution is disabled during request auditing.
[2024-01-12T08:42:51,181][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Index resolution is enabled during request auditing.
[2024-01-12T08:42:51,181][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Sensitive headers auditing is enabled.
[2024-01-12T08:42:51,181][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing requests from kibanaserver users is disabled.
[2024-01-12T08:42:51,181][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing of external configuration is disabled.
[2024-01-12T08:42:51,181][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing of internal configuration is enabled.
[2024-01-12T08:42:51,182][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing only metadata information for read request is enabled.
[2024-01-12T08:42:51,182][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing will watch {} for read requests.
[2024-01-12T08:42:51,182][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing read operation requests from kibanaserver users is disabled.
[2024-01-12T08:42:51,182][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing only metadata information for write request is enabled.
[2024-01-12T08:42:51,182][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing diffs for write requests is disabled.
[2024-01-12T08:42:51,182][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing write operation requests from kibanaserver users is disabled.
[2024-01-12T08:42:51,183][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Auditing will watch <NONE> for write requests.
[2024-01-12T08:42:51,183][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] .opendistro_security is used as internal security index.
[2024-01-12T08:42:51,183][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Internal index used for posting audit logs is null
[2024-01-12T08:42:51,183][INFO ][o.o.s.c.ConfigurationRepository] [2a9e851d2339] Hot-reloading of audit configuration is enabled
[2024-01-12T08:42:51,184][INFO ][o.o.s.c.ConfigurationRepository] [2a9e851d2339] Node '2a9e851d2339' initialized
[2024-01-12T08:42:51,185][INFO ][o.o.n.Node               ] [2a9e851d2339] stopping ...
[2024-01-12T08:42:51,185][INFO ][o.o.s.a.r.AuditMessageRouter] [2a9e851d2339] Closing AuditMessageRouter
[2024-01-12T08:42:51,186][INFO ][o.o.s.a.s.SinkProvider   ] [2a9e851d2339] Closing DebugSink
[2024-01-12T08:42:51,204][INFO ][o.o.n.Node               ] [2a9e851d2339] stopped
[2024-01-12T08:42:51,205][INFO ][o.o.n.Node               ] [2a9e851d2339] closing ...
[2024-01-12T08:42:51,209][INFO ][o.o.s.a.i.AuditLogImpl   ] [2a9e851d2339] Closing AuditLogImpl
[2024-01-12T08:42:51,211][INFO ][o.o.n.Node               ] [2a9e851d2339] closed
janheise commented 5 months ago

@j3k0 Thanks for the additional info. I have to discuss it with my colleagues. I'll get back to you asap.

janheise commented 5 months ago

@j3k0 We're missing the surrounding logs of the DataNode (the excerpt only covers OpenSearch) - is it possible to also attach them?

j3k0 commented 5 months ago

I apologize I recreated new docker containers, so the logs are lost I'm afraid.

Reproducing should be easy: reduce OpenSearch's flood-stage limit on the index will turn it read-only. Try restarting the docker containers, OpenSearch doesn't start (while it should start read-only I believe).

patrickmann commented 5 months ago

Narrowing it down: I ran the following scenarios on a non-containerized datanode without being able to reproduce the problem - datanode recovered gracefully and server resumed ingesting data when more disk space became available. So it appears to be specific to docker or the reporter's environment.

@j3k0 Is there any chance the freed up disk space wasn't enough? Or that it is not available to docker? Also for debugging, can you disable observability, opendistro and any other 3rd party plugins?

dascgit commented 4 months ago

Are there some news on this? Logs look like this

2024-02-06T14:55:26.724Z INFO  [OpensearchProcessImpl]  at org.opensearch.client.support.AbstractClient$IndicesAdmin.create(AbstractClient.java:1622) ~[opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:26.724Z INFO  [OpensearchProcessImpl]  at org.opensearch.security.configuration.ConfigurationRepository.createSecurityIndexIfAbsent(ConfigurationRepository.java:268) ~[opensearch-security-2.10.0.0.jar:2.10.0.0]
2024-02-06T14:55:26.724Z INFO  [OpensearchProcessImpl]  at org.opensearch.security.configuration.ConfigurationRepository.lambda$new$0(ConfigurationRepository.java:143) [opensearch-security-2.10.0.0.jar:2.10.0.0]
2024-02-06T14:55:26.724Z INFO  [OpensearchProcessImpl]  at java.lang.Thread.run(Thread.java:833) [?:?]
2024-02-06T14:55:27.264Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,264][INFO ][o.o.p.PluginsService     ] [datanode] PluginService:onIndexModule index:[graylog_45/q6uh2lIVR-WV0zelx3pF8w]
2024-02-06T14:55:27.338Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,337][INFO ][stdout                   ] [datanode] [FINE] No subscribers registered for event class org.opensearch.security.securityconf.DynamicConfigFactory$NodesDnModelImpl
2024-02-06T14:55:27.341Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,341][INFO ][stdout                   ] [datanode] [FINE] No subscribers registered for event class org.greenrobot.eventbus.NoSubscriberEvent
2024-02-06T14:55:27.344Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,343][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing on REST API is enabled.
2024-02-06T14:55:27.346Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,345][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] [AUTHENTICATED, GRANTED_PRIVILEGES] are excluded from REST API auditing.
2024-02-06T14:55:27.346Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,346][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing on Transport API is enabled.
2024-02-06T14:55:27.347Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,347][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] [AUTHENTICATED, GRANTED_PRIVILEGES] are excluded from Transport API auditing.
2024-02-06T14:55:27.348Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,347][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing of request body is enabled.
2024-02-06T14:55:27.349Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,349][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Bulk requests resolution is disabled during request auditing.
2024-02-06T14:55:27.349Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,349][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Index resolution is enabled during request auditing.
2024-02-06T14:55:27.350Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,350][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Sensitive headers auditing is enabled.
2024-02-06T14:55:27.351Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,350][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing requests from kibanaserver users is disabled.
2024-02-06T14:55:27.351Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,351][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing of external configuration is disabled.
2024-02-06T14:55:27.352Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,352][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing of internal configuration is enabled.
2024-02-06T14:55:27.353Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,352][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing only metadata information for read request is enabled.
2024-02-06T14:55:27.353Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,353][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing will watch {} for read requests.
2024-02-06T14:55:27.354Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,354][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing read operation requests from kibanaserver users is disabled.
2024-02-06T14:55:27.354Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,354][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing only metadata information for write request is enabled.
2024-02-06T14:55:27.355Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,355][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing diffs for write requests is disabled.
2024-02-06T14:55:27.356Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,355][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing write operation requests from kibanaserver users is disabled.
2024-02-06T14:55:27.356Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,356][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Auditing will watch <NONE> for write requests.
2024-02-06T14:55:27.357Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,357][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] .opendistro_security is used as internal security index.
2024-02-06T14:55:27.358Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,357][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Internal index used for posting audit logs is null
2024-02-06T14:55:27.359Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,359][INFO ][o.o.s.c.ConfigurationRepository] [datanode] Hot-reloading of audit configuration is enabled
2024-02-06T14:55:27.360Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,359][INFO ][o.o.s.c.ConfigurationRepository] [datanode] Node 'datanode' initialized
2024-02-06T14:55:27.369Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,368][INFO ][o.o.n.Node               ] [datanode] stopping ...
2024-02-06T14:55:27.371Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,371][INFO ][o.o.s.a.r.AuditMessageRouter] [datanode] Closing AuditMessageRouter
2024-02-06T14:55:27.372Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,372][INFO ][o.o.s.a.s.SinkProvider   ] [datanode] Closing DebugSink
2024-02-06T14:55:27.436Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,433][WARN ][o.o.c.a.s.ShardStateAction] [datanode] unexpected failure while sending request [internal:cluster/shard/started] to [{datanode}{Wuy0JcxyRyaHRtRuqytpOQ}{nRPEpldeQaOfTWgPKAlMaw}{172.20.0.2}{172.20.0.2:9300}{dimr}{shard_indexing_pressure_enabled=true}] for shard entry [StartedShardEntry{shardId [[graylog_45][2]], allocationId [UhrX8fa-TeG1tgcocnK8ew], primary term [21], message [after existing store recovery; bootstrap_history_uuid=false]}]
2024-02-06T14:55:27.437Z INFO  [OpensearchProcessImpl] org.opensearch.transport.SendRequestTransportException: [datanode][172.20.0.2:9300][internal:cluster/shard/started]
2024-02-06T14:55:27.437Z INFO  [OpensearchProcessImpl]  at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:1033) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.438Z INFO  [OpensearchProcessImpl]  at org.opensearch.security.transport.SecurityInterceptor.sendRequestDecorate(SecurityInterceptor.java:246) [opensearch-security-2.10.0.0.jar:2.10.0.0]
2024-02-06T14:55:27.439Z INFO  [OpensearchProcessImpl]  at org.opensearch.security.OpenSearchSecurityPlugin$7$2.sendRequest(OpenSearchSecurityPlugin.java:782) [opensearch-security-2.10.0.0.jar:2.10.0.0]
2024-02-06T14:55:27.439Z INFO  [OpensearchProcessImpl]  at org.opensearch.transport.TransportService.sendRequest(TransportService.java:907) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.439Z INFO  [OpensearchProcessImpl]  at org.opensearch.transport.TransportService.sendRequest(TransportService.java:832) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.440Z INFO  [OpensearchProcessImpl]  at org.opensearch.cluster.action.shard.ShardStateAction.sendShardAction(ShardStateAction.java:192) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.440Z INFO  [OpensearchProcessImpl]  at org.opensearch.cluster.action.shard.ShardStateAction.shardStarted(ShardStateAction.java:673) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.440Z INFO  [OpensearchProcessImpl]  at org.opensearch.cluster.action.shard.ShardStateAction.shardStarted(ShardStateAction.java:662) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.441Z INFO  [OpensearchProcessImpl]  at org.opensearch.indices.cluster.IndicesClusterStateService.handleRecoveryDone(IndicesClusterStateService.java:798) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.441Z INFO  [OpensearchProcessImpl]  at org.opensearch.indices.recovery.RecoveryListener.onDone(RecoveryListener.java:48) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.441Z INFO  [OpensearchProcessImpl]  at org.opensearch.index.shard.IndexShard.lambda$executeRecovery$32(IndexShard.java:3630) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.442Z INFO  [OpensearchProcessImpl]  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.442Z INFO  [OpensearchProcessImpl]  at org.opensearch.index.shard.StoreRecovery.lambda$recoveryListener$7(StoreRecovery.java:495) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.442Z INFO  [OpensearchProcessImpl]  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.443Z INFO  [OpensearchProcessImpl]  at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:355) [opensearch-core-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.443Z INFO  [OpensearchProcessImpl]  at org.opensearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:119) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.443Z INFO  [OpensearchProcessImpl]  at org.opensearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2684) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.444Z INFO  [OpensearchProcessImpl]  at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.444Z INFO  [OpensearchProcessImpl]  at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.444Z INFO  [OpensearchProcessImpl]  at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.445Z INFO  [OpensearchProcessImpl]  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2024-02-06T14:55:27.445Z INFO  [OpensearchProcessImpl]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2024-02-06T14:55:27.445Z INFO  [OpensearchProcessImpl]  at java.lang.Thread.run(Thread.java:833) [?:?]
2024-02-06T14:55:27.445Z INFO  [OpensearchProcessImpl] Caused by: org.opensearch.node.NodeClosedException: node closed {datanode}{Wuy0JcxyRyaHRtRuqytpOQ}{nRPEpldeQaOfTWgPKAlMaw}{172.20.0.2}{172.20.0.2:9300}{dimr}{shard_indexing_pressure_enabled=true}
2024-02-06T14:55:27.446Z INFO  [OpensearchProcessImpl]  at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:1014) ~[opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.446Z INFO  [OpensearchProcessImpl]  ... 22 more
2024-02-06T14:55:27.456Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,435][WARN ][o.o.c.a.s.ShardStateAction] [datanode] unexpected failure while sending request [internal:cluster/shard/started] to [{datanode}{Wuy0JcxyRyaHRtRuqytpOQ}{nRPEpldeQaOfTWgPKAlMaw}{172.20.0.2}{172.20.0.2:9300}{dimr}{shard_indexing_pressure_enabled=true}] for shard entry [StartedShardEntry{shardId [[graylog_45][1]], allocationId [-QNoDW1jSrursXtauGPP1g], primary term [21], message [after existing store recovery; bootstrap_history_uuid=false]}]
2024-02-06T14:55:27.457Z INFO  [OpensearchProcessImpl] org.opensearch.transport.SendRequestTransportException: [datanode][172.20.0.2:9300][internal:cluster/shard/started]
2024-02-06T14:55:27.457Z INFO  [OpensearchProcessImpl]  at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:1033) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.457Z INFO  [OpensearchProcessImpl]  at org.opensearch.security.transport.SecurityInterceptor.sendRequestDecorate(SecurityInterceptor.java:246) [opensearch-security-2.10.0.0.jar:2.10.0.0]
2024-02-06T14:55:27.458Z INFO  [OpensearchProcessImpl]  at org.opensearch.security.OpenSearchSecurityPlugin$7$2.sendRequest(OpenSearchSecurityPlugin.java:782) [opensearch-security-2.10.0.0.jar:2.10.0.0]
2024-02-06T14:55:27.458Z INFO  [OpensearchProcessImpl]  at org.opensearch.transport.TransportService.sendRequest(TransportService.java:907) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.458Z INFO  [OpensearchProcessImpl]  at org.opensearch.transport.TransportService.sendRequest(TransportService.java:832) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.459Z INFO  [OpensearchProcessImpl]  at org.opensearch.cluster.action.shard.ShardStateAction.sendShardAction(ShardStateAction.java:192) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.459Z INFO  [OpensearchProcessImpl]  at org.opensearch.cluster.action.shard.ShardStateAction.shardStarted(ShardStateAction.java:673) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.459Z INFO  [OpensearchProcessImpl]  at org.opensearch.cluster.action.shard.ShardStateAction.shardStarted(ShardStateAction.java:662) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.460Z INFO  [OpensearchProcessImpl]  at org.opensearch.indices.cluster.IndicesClusterStateService.handleRecoveryDone(IndicesClusterStateService.java:798) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.460Z INFO  [OpensearchProcessImpl]  at org.opensearch.indices.recovery.RecoveryListener.onDone(RecoveryListener.java:48) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.460Z INFO  [OpensearchProcessImpl]  at org.opensearch.index.shard.IndexShard.lambda$executeRecovery$32(IndexShard.java:3630) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.461Z INFO  [OpensearchProcessImpl]  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.461Z INFO  [OpensearchProcessImpl]  at org.opensearch.index.shard.StoreRecovery.lambda$recoveryListener$7(StoreRecovery.java:495) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.461Z INFO  [OpensearchProcessImpl]  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.462Z INFO  [OpensearchProcessImpl]  at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:355) [opensearch-core-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.462Z INFO  [OpensearchProcessImpl]  at org.opensearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:119) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.462Z INFO  [OpensearchProcessImpl]  at org.opensearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2684) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.463Z INFO  [OpensearchProcessImpl]  at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.463Z INFO  [OpensearchProcessImpl]  at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.463Z INFO  [OpensearchProcessImpl]  at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.464Z INFO  [OpensearchProcessImpl]  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2024-02-06T14:55:27.464Z INFO  [OpensearchProcessImpl]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2024-02-06T14:55:27.464Z INFO  [OpensearchProcessImpl]  at java.lang.Thread.run(Thread.java:833) [?:?]
2024-02-06T14:55:27.464Z INFO  [OpensearchProcessImpl] Caused by: org.opensearch.node.NodeClosedException: node closed {datanode}{Wuy0JcxyRyaHRtRuqytpOQ}{nRPEpldeQaOfTWgPKAlMaw}{172.20.0.2}{172.20.0.2:9300}{dimr}{shard_indexing_pressure_enabled=true}
2024-02-06T14:55:27.465Z INFO  [OpensearchProcessImpl]  at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:1014) ~[opensearch-2.10.0.jar:2.10.0]
2024-02-06T14:55:27.465Z INFO  [OpensearchProcessImpl]  ... 22 more
2024-02-06T14:55:27.501Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,499][INFO ][o.o.n.Node               ] [datanode] stopped
2024-02-06T14:55:27.502Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,499][INFO ][o.o.n.Node               ] [datanode] closing ...
2024-02-06T14:55:27.508Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,508][INFO ][o.o.s.a.i.AuditLogImpl   ] [datanode] Closing AuditLogImpl
2024-02-06T14:55:27.517Z INFO  [OpensearchProcessImpl] [2024-02-06T14:55:27,516][INFO ][o.o.n.Node               ] [datanode] closed
2024-02-06T14:55:27.780Z WARN  [OpensearchProcessImpl] Opensearch process failed
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) ~[commons-exec-1.3.jar:1.3]
    at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48) ~[commons-exec-1.3.jar:1.3]
    at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200) [commons-exec-1.3.jar:1.3]
    at java.lang.Thread.run(Unknown Source) [?:?]
2024-02-06T14:55:27.781Z WARN  [ProcessWatchdog] Process watchdog terminated after too many restart attempts
janheise commented 4 months ago

@dascgit Hi, what news are you looking for? your log snippet does not give an indication about the error (might be a total different problem than the first), do you have more of the logs before these lines? Also, we could not reproduce the original error. Thinking about the original error again, my current guess is - because it's inside docker - that for the original poster, the internal docker volume ran full.

dascgit commented 4 months ago

@janheise I think its the same problem. The Docker Volume run full Disk on the System gets expanded Docker Volume has space But if the OpenSearch container did hit the watermark stage before expanding the disk, OpenSearch stays in Watermark mode and will never run again. How can this "Watermark mode" been disable once the Disk has been expanded?

j3k0 commented 4 months ago

How can this "Watermark mode" been disable

It's with an API request to the REST API, which requires the OpenSearch to be up, even if it's in read-only mode. See https://stackoverflow.com/questions/50609417/elasticsearch-error-cluster-block-exception-forbidden-12-index-read-only-all

janheise commented 4 months ago

What I don't understand: In my experience (and also, what my colleague tested) - if OpenSearch hits the watermark, it's still running. It's also restarting. It should - after expanding the volume - automatically get back into regular mode after some time. If that's not working, yes, one should send a request to do so. This is currently not easy to accomplish.

But, if OpenSearch is not starting at all inside the DataNode, you can not send any requests that change configuration. I think, in that case, it's not a simple watermark issue. I need more logs if we want to get to the bottom of this.

dascgit commented 4 months ago

@janheise If I will get the errors again I will try to get you more logs

kmerz commented 3 months ago

Closed for now, since it did not seem to reappear.