StamusNetworks / SELKS

A Suricata based IDS/IPS/NSM distro
https://www.stamus-networks.com/open-source/#selks
GNU General Public License v3.0
1.43k stars 284 forks source link

Recent SELKS upgrades have resulted in inoperable IDS installations #374

Open cwwade opened 2 years ago

cwwade commented 2 years ago

We are currently supporting two simple SELKS installations at two different sites. These are deployed as Debian VMs (4.19.0-18-amd64) built from the standard SELKS ISO. At this time, these SELKS systems are operating as minimal IDS systems with the ELK components providing storage and search. Packet capture is not currently being used. Both systems have been updated regularly as new updates have been provided by the SELKS team, always using the provided SELKS scripts.

On December 15, both SELKS VMs were upgraded using the "selks-upgrade_stamus" script. The only warning reported was related to the Arkime/Moloch upgrade: WARNING elasticsearch health is 'yellow' instead of 'green', things may be broken

However, neither of these SELKS VMs are performing packet captures using Arkime/Moloch (option 3-NONE selected during first time setup). Also, elasticsearch had been running fine on both VMs.

After the upgrade completed, the "selks-health-check_stamus" showed everything operating normally with no warnings. The Scirius user interface has continued to be available, and Suricata alerts activity was consistent with prior behavior. No unusual events or alarms were reported. However, Elasticsearch was showing one unassigned shard on each system.

Unfortunately, a week later, both systems stopped reporting any alerts activity. There had been no changes to any configurations, and tcpdumps showed normal traffic flowing on the interfaces configured for "threat detection." On December 25, these two systems were upgraded again using the "selks-upgrade_stamus" script. This was in part to hopefully restore normal operations, but also to deploy the latest versions of elasticsearch, logstash, and kibana with further mitigation of the latest Log4Shell vulnerabilities. The upgrades had no effect on these systems, and all alert activity remained silent.

With other activities taking priority, today was the first opportunity to dig into these problems further. Here are some of observations we've made:

Clearly, something has been corrupted in these two SELKS installations, which is surprising since both systems had been working fine before software upgrades were applied. Given that these installations are also based on minimal configurations with no customization of the SELKS configurations or packages, and only minor changes to the network configuration from stock deployment, the upgrades should not have caused problems of this nature. We are at a loss at this time to figure out where to look for clues as to how to resolve this problem. We do have captured transcripts of the upgrades and subsequent tests that we can share along with any of the log files.

Suggestions for how to proceed with diagnosing these problems would be much appreciated.

pevma commented 2 years ago

Can you share the out put of selks-health-check_stamus ?

cwwade commented 2 years ago

Health check transcript sent via email response...

pevma commented 2 years ago

I do not have any email yet, or have not seen one. Feel free to join us in our discord channel - https://github.com/StamusNetworks/SELKS/wiki/Getting-Help

cwwade commented 2 years ago

Here is the Health Check transcript:

`chuck@SELKS-02:~$ sudo selks-health-check_stamus [sudo] password for chuck: ● suricata.service - LSB: Next Generation IDS/IPS Loaded: loaded (/etc/init.d/suricata; generated) Active: active (running) since Mon 2022-01-03 16:29:00 EST; 1h 40min ago Docs: man:systemd-sysv-generator(8) Process: 553 ExecStart=/etc/init.d/suricata start (code=exited, status=0/SUCCESS) Tasks: 30 (limit: 4915) Memory: 340.2M CGroup: /system.slice/suricata.service └─648 /usr/bin/suricata -c /etc/suricata/suricata.yaml --pidfile /var/run/suricata.pid --af-packet -D -v --user=logstash

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. ● elasticsearch.service - Elasticsearch Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2022-01-03 16:29:41 EST; 1h 39min ago Docs: https://www.elastic.co Main PID: 545 (java) Tasks: 156 (limit: 4915) Memory: 4.5G CGroup: /system.slice/elasticsearch.service ├─545 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.… └─850 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. ● logstash.service - logstash Loaded: loaded (/etc/systemd/system/logstash.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/logstash.service.d └─override.conf Active: active (running) since Mon 2022-01-03 16:29:41 EST; 1h 39min ago Main PID: 896 (java) Tasks: 43 (limit: 4915) Memory: 826.5M CGroup: /system.slice/logstash.service └─896 /usr/share/logstash/jdk/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.compile.invokedynamic=true -Djruby.jit.threshold=0 -Djruby.regexp.interruptible=true -XX:+HeapDumpOnOutOfMemoryError -D…

Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,801][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,802][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,803][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,804][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,805][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,806][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,808][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,810][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,811][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Jan 03 18:09:21 SELKS-02 logstash[896]: [2022-01-03T18:09:21,811][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-flow-2022.01.03", :routing=>nil}, {"src_ip"=>"192.168… Hint: Some lines were ellipsized, use -l to show in full. ● kibana.service - Kibana Loaded: loaded (/etc/systemd/system/kibana.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2022-01-03 18:09:12 EST; 9s ago Docs: https://www.elastic.co Main PID: 4265 (node) Tasks: 11 (limit: 4915) Memory: 187.4M CGroup: /system.slice/kibana.service └─4265 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli/dist --logging.dest=/var/log/kibana/kibana.log --pid.file=/run/kibana/kibana.pid

Jan 03 18:09:12 SELKS-02 systemd[1]: Started Kibana. ● evebox.service - EveBox Server Loaded: loaded (/lib/systemd/system/evebox.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2022-01-03 16:29:00 EST; 1h 40min ago Main PID: 540 (evebox) Tasks: 5 (limit: 4915) Memory: 2.8M CGroup: /system.slice/evebox.service └─540 /usr/bin/evebox server

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. Unit molochviewer-selks.service could not be found. Unit molochpcapread-selks.service could not be found. scirius RUNNING pid 759, uptime 1:40:20 ii elasticsearch 7.16.2 amd64 Distributed RESTful search engine built for the cloud ii elasticsearch-curator 5.8.4 amd64 Have indices in Elasticsearch? This is the tool for you!\n\nLike a museum curator manages the exhibits and collections on display, \nElasticsearch Curator helps you curate, or manage your indices. ii evebox 1:0.14.0 amd64 no description given ii kibana 7.16.2 amd64 Explore and visualize your Elasticsearch data ii kibana-dashboards-stamus 2020122001 amd64 Kibana 6 dashboard templates. ii logstash 1:7.16.2-1 amd64 An extensible logging pipeline ii moloch 3.2.1-1 amd64 Moloch Full Packet System ii scirius 3.7.0-6 amd64 Django application to manage Suricata ruleset ii suricata 1:2021111201-0stamus0 amd64 Suricata open source multi-thread IDS/IPS/NSM system. Filesystem Type Size Used Avail Use% Mounted on udev devtmpfs 3.9G 0 3.9G 0% /dev tmpfs tmpfs 798M 79M 720M 10% /run /dev/sda1 ext4 307G 85G 206G 30% / tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup tmpfs tmpfs 798M 0 798M 0% /run/user/1001 chuck@SELKS-02:~$ suricata -V This is Suricata version 7.0.0-dev (244dd11c3 2021-11-11) `

pevma commented 2 years ago

Can you please share the last 20-50 lines of /var/log/elasticsearch/elasticsearch.log ?

cwwade commented 2 years ago

Here is the elasticsearch.log capture you requested:

sudo tail -n 50 /var/log/elasticsearch/elasticsearch.log at java.lang.Thread.run(Thread.java:833) [?:?] [2022-01-05T17:09:41,907][INFO ][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [.deprecation-indexing-ilm-policy] for index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] on an error step due to a transient error, moving back to the failed step [attempt-rollover] for execution. retry attempt [999] [2022-01-05T17:09:41,908][INFO ][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [kibana-event-log-policy] for index [.kibana-event-log-7.15.2-000001] on an error step due to a transient error, moving back to the failed step [attempt-rollover] for execution. retry attempt [1443] [2022-01-05T17:09:42,067][ERROR][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [.deprecation-indexing-ilm-policy] for index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] failed on step [{"phase":"hot","action":"rollover","name":"attempt-rollover"}]. Moving to ERROR step org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] shards, but this cluster currently has [1000]/[1000] maximum normal shards open; at org.elasticsearch.indices.ShardLimitValidator.validateShardLimit(ShardLimitValidator.java:107) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.aggregateIndexSettings(MetadataCreateIndexService.java:1046) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequestWithV2Template(MetadataCreateIndexService.java:620) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:393) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.MetadataRolloverService.rolloverDataStream(MetadataRolloverService.java:306) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.MetadataRolloverService.rolloverClusterState(MetadataRolloverService.java:130) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.TransportRolloverAction$1.execute(TransportRolloverAction.java:208) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:836) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:403) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:243) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.access$100(MasterService.java:63) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:170) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:262) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:225) [elasticsearch-7.16.2.jar:7.16.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] [2022-01-05T17:09:42,069][ERROR][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [kibana-event-log-policy] for index [.kibana-event-log-7.15.2-000001] failed on step [{"phase":"hot","action":"rollover","name":"attempt-rollover"}]. Moving to ERROR step org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] shards, but this cluster currently has [1000]/[1000] maximum normal shards open; at org.elasticsearch.indices.ShardLimitValidator.validateShardLimit(ShardLimitValidator.java:107) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.aggregateIndexSettings(MetadataCreateIndexService.java:1046) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequestWithV1Templates(MetadataCreateIndexService.java:553) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:413) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:420) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.MetadataRolloverService.rolloverAlias(MetadataRolloverService.java:239) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.MetadataRolloverService.rolloverClusterState(MetadataRolloverService.java:119) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.TransportRolloverAction$1.execute(TransportRolloverAction.java:208) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:836) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:403) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:243) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.access$100(MasterService.java:63) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:170) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:262) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:225) [elasticsearch-7.16.2.jar:7.16.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?]

pevma commented 2 years ago

It looks like you are hitting the max shards limit “

Moving to ERROR step org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] shards, but this cluster currently has [1000]/[1000] maximum normal shards open;

Do you have older data/logs that needs curating ? You can also increase the shards limit - but it depends on your setup/retention needs.

-- Regards, Peter Manev

On 5 Jan 2022, at 23:15, cwwade @.***> wrote:

 Here is the elasticsearch.log capture you requested:

sudo tail -n 50 /var/log/elasticsearch/elasticsearch.log at java.lang.Thread.run(Thread.java:833) [?:?] [2022-01-05T17:09:41,907][INFO ][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [.deprecation-indexing-ilm-policy] for index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] on an error step due to a transient error, moving back to the failed step [attempt-rollover] for execution. retry attempt [999] [2022-01-05T17:09:41,908][INFO ][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [kibana-event-log-policy] for index [.kibana-event-log-7.15.2-000001] on an error step due to a transient error, moving back to the failed step [attempt-rollover] for execution. retry attempt [1443] [2022-01-05T17:09:42,067][ERROR][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [.deprecation-indexing-ilm-policy] for index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] failed on step [{"phase":"hot","action":"rollover","name":"attempt-rollover"}]. Moving to ERROR step org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] shards, but this cluster currently has [1000]/[1000] maximum normal shards open; at org.elasticsearch.indices.ShardLimitValidator.validateShardLimit(ShardLimitValidator.java:107) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.aggregateIndexSettings(MetadataCreateIndexService.java:1046) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequestWithV2Template(MetadataCreateIndexService.java:620) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:393) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.MetadataRolloverService.rolloverDataStream(MetadataRolloverService.java:306) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.MetadataRolloverService.rolloverClusterState(MetadataRolloverService.java:130) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.TransportRolloverAction$1.execute(TransportRolloverAction.java:208) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:836) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:403) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:243) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.access$100(MasterService.java:63) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:170) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:262) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:225) [elasticsearch-7.16.2.jar:7.16.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] [2022-01-05T17:09:42,069][ERROR][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [kibana-event-log-policy] for index [.kibana-event-log-7.15.2-000001] failed on step [{"phase":"hot","action":"rollover","name":"attempt-rollover"}]. Moving to ERROR step org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] shards, but this cluster currently has [1000]/[1000] maximum normal shards open; at org.elasticsearch.indices.ShardLimitValidator.validateShardLimit(ShardLimitValidator.java:107) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.aggregateIndexSettings(MetadataCreateIndexService.java:1046) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequestWithV1Templates(MetadataCreateIndexService.java:553) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:413) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:420) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.MetadataRolloverService.rolloverAlias(MetadataRolloverService.java:239) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.MetadataRolloverService.rolloverClusterState(MetadataRolloverService.java:119) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.action.admin.indices.rollover.TransportRolloverAction$1.execute(TransportRolloverAction.java:208) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:836) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:403) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:243) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.access$100(MasterService.java:63) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:170) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:262) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:225) [elasticsearch-7.16.2.jar:7.16.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?]

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.

cwwade commented 2 years ago

@pevma

Thank you for this observation. I confess that I don't have a sufficient understanding of how Elasticsearch storage and shards are being managed in a SELKS deployment. Both of our SELKS VMs were deployed as clean installs in the October timeframe. I had thought that the cron job described on the Data lifecycle page would be performing necessary housekeeping. Since these systems are lightly loaded, I had presumed that things would get cleaned up automatically. Apparently, I'm missing something here.

What would be the recommended approach to resolving this problem? Should I start by increasing the limit on shards beyond 1,000? And, if so, what would be the right way to do this within the SELKS context? What other steps do we need to take to restore full operational status?

Also, is there something about the upgrades that might have contributed to this problem emerging? Should we take additional steps to prepare for future upgrades, or to perform housekeeping post upgrade?

For the longer term, what changes are recommended in our operating procedures to proactively prevent this problem from recurring? Can you point us to relevant documentation we can follow to improve our operating procedures?

Thanks...

pevma commented 2 years ago

Sure, no problem. The data lifecycle is about cleaning up older logs or log rotation so that part should be good i think. You can try adjusting the value of the shards open, something like (please test that in QA first as a regular process of introducing changes into production :) )

curl -X PUT localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{ "persistent": { "cluster.max_shards_per_node": "3000" } }'

You can read a bit more here about similar issues -
https://github.com/wazuh/wazuh-puppet/issues/222
https://discuss.elastic.co/t/how-to-fix-hitting-maximum-shards-open-error/200502

Thanks

cwwade commented 2 years ago

Your suggestions are helpful. I have read the documentation you referenced, and bumped the number of shards to 3,000. This has restored normal operation on the SELKS VM we use for testing. However, this has resulted in a second unassigned shard. The total number of active primary shards has increased from 999 to 1016 in the last hour, which seems reasonable based on my understanding of what should be normal behavior.

My next step would seem to be to resolve the issue of unassigned shards. Can you point me to documentation for how to go about doing this in a manner that would be consistent with a SELKS deployment. What I've found so far seems overly complex for our simple SELKS configurations. I am also looking for a procedure that would be consistent with the way that SELKS has been built.

Since the original problem with Elasticsearch warning about an unassigned shard occurred with both of our SELKS installations immediately after executing the "selks-upgrade_stamus" script on 15-December, I wonder if there is something else we should have done either prior to the upgrade, or afterwards. In particular, was there something about the upgrade that caused an increase in the number of active shards? I do not recall seeing the active shard count getting so high prior to the upgrade.

Again, thank you for your assistance and suggestions...

cwwade commented 2 years ago

In case this is helpful, I have copied below the transcript from the curl -X PUT command to increase the maximum shards to 3000 followed by a query on shard status:

$ curl -X PUT localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{ "persistent": { "cluster.max_shards_per_node": "3000" } }' {"acknowledged":true,"persistent":{"cluster":{"max_shards_per_node":"3000"}},"transient":{}}

And here is the result of a query about cluster allocation:

$ curl -XGET localhost:9200/_cluster/allocation/explain?pretty { "note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.", "index" : ".ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001", "shard" : 0, "primary" : false, "current_state" : "unassigned", "unassigned_info" : { "reason" : "CLUSTER_RECOVERED", "at" : "2022-01-03T21:29:41.187Z", "last_allocation_status" : "no_attempt" }, "can_allocate" : "no", "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes", "node_allocation_decisions" : [ { "node_id" : "ZVq0J2PlTTWW9z61OZHltA", "node_name" : "SELKS-02", "transport_address" : "127.0.0.1:9300", "node_attributes" : { "ml.machine_memory" : "8365969408", "xpack.installed" : "true", "transform.node" : "true", "ml.max_open_jobs" : "512", "ml.max_jvm_size" : "4185915392" }, "node_decision" : "no", "deciders" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "a copy of this shard is already allocated to this node [[.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001][0], node[ZVq0J2PlTTWW9z61OZHltA], [P], s[STARTED], a[id=PpCQ1d-PTx6wnq2ujB3V1Q]]" } ] } ] } Finally, here is the Elasticsearch log associated with increasing the max active shards:

[2022-01-06T14:04:34,524][INFO ][o.e.c.s.ClusterSettings ] [SELKS-02] updating [cluster.max_shards_per_node] from [1000] to [3000] [2022-01-06T14:04:34,886][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-flow-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:04:35,088][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-ssh-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:04:35,568][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:04:35,583][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:04:35,767][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-ssh-2022.01.06/S0DcAfP2R1OCfrEU2c_2LQ] update_mapping [_doc] [2022-01-06T14:04:35,911][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-tls-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:04:36,070][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-fileinfo-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:04:36,175][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-http-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:04:36,309][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-dns-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:04:36,543][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-fileinfo-2022.01.06/FrcNuthfQ0a4afXdk1wOUQ] update_mapping [_doc] [2022-01-06T14:04:36,548][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dns-2022.01.06/gqJwfAT6RxWUpLpCcJURwQ] update_mapping [_doc] [2022-01-06T14:04:36,557][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:04:36,648][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:04:36,662][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-tls-2022.01.06/HFMOKyRiRzKRtyM9DZ555Q] update_mapping [_doc] [2022-01-06T14:04:36,764][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dns-2022.01.06/gqJwfAT6RxWUpLpCcJURwQ] update_mapping [_doc] [2022-01-06T14:04:36,938][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dns-2022.01.06/gqJwfAT6RxWUpLpCcJURwQ] update_mapping [_doc] [2022-01-06T14:04:37,094][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-tls-2022.01.06/HFMOKyRiRzKRtyM9DZ555Q] update_mapping [_doc] [2022-01-06T14:04:37,891][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dns-2022.01.06/gqJwfAT6RxWUpLpCcJURwQ] update_mapping [_doc] [2022-01-06T14:04:38,904][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:04:39,019][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-fileinfo-2022.01.06/FrcNuthfQ0a4afXdk1wOUQ] update_mapping [_doc] [2022-01-06T14:04:40,932][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:04:41,451][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-2022.01.06/4AfAtLMOTY6dm8xgqK447g] update_mapping [_doc] [2022-01-06T14:04:47,943][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-fileinfo-2022.01.06/FrcNuthfQ0a4afXdk1wOUQ] update_mapping [_doc] [2022-01-06T14:04:48,051][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:04:48,951][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-tls-2022.01.06/HFMOKyRiRzKRtyM9DZ555Q] update_mapping [_doc] [2022-01-06T14:04:52,973][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dns-2022.01.06/gqJwfAT6RxWUpLpCcJURwQ] update_mapping [_doc] [2022-01-06T14:04:56,995][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:04:57,108][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dns-2022.01.06/gqJwfAT6RxWUpLpCcJURwQ] update_mapping [_doc] [2022-01-06T14:04:58,082][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-2022.01.06/4AfAtLMOTY6dm8xgqK447g] update_mapping [_doc] [2022-01-06T14:04:59,262][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [.kibana_7.16.2_reindex_temp] creating index, cause [api], templates [], shards [1]/[1] [2022-01-06T14:04:59,269][INFO ][o.e.c.r.a.AllocationService] [SELKS-02] updating number_of_replicas to [0] for indices [.kibana_7.16.2_reindex_temp] [2022-01-06T14:04:59,392][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [.kibana_task_manager_7.16.2_reindex_temp] creating index, cause [api], templates [], shards [1]/[1] [2022-01-06T14:04:59,398][INFO ][o.e.c.r.a.AllocationService] [SELKS-02] updating number_of_replicas to [0] for indices [.kibana_task_manager_7.16.2_reindex_temp] [2022-01-06T14:05:00,541][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_task_manager_7.16.2_reindex_temp/WjjkdcVvTZaEW5FcGMMKmQ] update_mapping [_doc] [2022-01-06T14:05:00,713][INFO ][o.e.c.m.MetadataIndexStateService] [SELKS-02] adding block write to indices [[.kibana_task_manager_7.16.2_reindex_temp/WjjkdcVvTZaEW5FcGMMKmQ]] [2022-01-06T14:05:00,858][INFO ][o.e.c.m.MetadataIndexStateService] [SELKS-02] completed adding block write to indices [.kibana_task_manager_7.16.2_reindex_temp] [2022-01-06T14:05:00,929][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:00,993][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] applying create index request using existing index [.kibana_task_manager_7.16.2_reindex_temp] metadata [2022-01-06T14:05:01,089][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [.kibana_task_manager_7.16.2_001] creating index, cause [clone_index], templates [], shards [1]/[1] [2022-01-06T14:05:01,097][INFO ][o.e.c.r.a.AllocationService] [SELKS-02] updating number_of_replicas to [0] for indices [.kibana_task_manager_7.16.2_001] [2022-01-06T14:05:01,218][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:01,303][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_task_manager_7.16.2_001/zXueNGplRnC1z9xnRjFc8A] create_mapping [_doc] [2022-01-06T14:05:01,377][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:01,459][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:01,660][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_task_manager_7.16.2_001/zXueNGplRnC1z9xnRjFc8A] update_mapping [_doc] [2022-01-06T14:05:01,853][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:01,903][INFO ][o.e.t.LoggingTaskListener] [SELKS-02] 18834957 finished with response BulkByScrollResponse[took=64.8ms,timed_out=false,sliceId=null,updated=18,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]] [2022-01-06T14:05:01,980][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:02,044][INFO ][o.e.c.m.MetadataDeleteIndexService] [SELKS-02] [.kibana_task_manager_7.16.2_reindex_temp/WjjkdcVvTZaEW5FcGMMKmQ] deleting index [2022-01-06T14:05:02,156][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:02,315][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:02,411][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:02,486][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] update_mapping [_doc] [2022-01-06T14:05:02,728][INFO ][o.e.c.m.MetadataIndexStateService] [SELKS-02] adding block write to indices [[.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA]] [2022-01-06T14:05:02,833][INFO ][o.e.c.m.MetadataIndexStateService] [SELKS-02] completed adding block write to indices [.kibana_7.16.2_reindex_temp] [2022-01-06T14:05:02,911][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] applying create index request using existing index [.kibana_7.16.2_reindex_temp] metadata [2022-01-06T14:05:02,914][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [.kibana_7.16.2_001] creating index, cause [clone_index], templates [], shards [1]/[1] [2022-01-06T14:05:02,921][INFO ][o.e.c.r.a.AllocationService] [SELKS-02] updating number_of_replicas to [0] for indices [.kibana_7.16.2_001] [2022-01-06T14:05:03,066][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_001/_a6y5B1SS3qo9LNCr2WmBw] create_mapping [_doc] [2022-01-06T14:05:03,374][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [.kibana_7.16.2_001/_a6y5B1SS3qo9LNCr2WmBw] update_mapping [_doc] [2022-01-06T14:05:04,036][INFO ][o.e.t.LoggingTaskListener] [SELKS-02] 18835181 finished with response BulkByScrollResponse[took=546.1ms,timed_out=false,sliceId=null,updated=496,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]] [2022-01-06T14:05:04,120][INFO ][o.e.c.m.MetadataDeleteIndexService] [SELKS-02] [.kibana_7.16.2_reindex_temp/JPK0xaHsSGOxYv5nYqAzFA] deleting index [2022-01-06T14:05:06,070][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:05:09,073][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-dhcp-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:05:09,437][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dhcp-2022.01.06/76tYAeZKQLSzoGbhDypnqw] update_mapping [_doc] [2022-01-06T14:05:09,451][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dhcp-2022.01.06/76tYAeZKQLSzoGbhDypnqw] update_mapping [_doc] [2022-01-06T14:05:11,105][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dhcp-2022.01.06/76tYAeZKQLSzoGbhDypnqw] update_mapping [_doc] [2022-01-06T14:05:11,905][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [SELKS-02] updating index lifecycle policy [.alerts-ilm-policy] [2022-01-06T14:05:12,108][WARN ][r.suppressed ] [SELKS-02] path: /_aliases, params: {} java.lang.IllegalStateException: alias [.kibana-event-log-7.15.1] has is_hidden set to true on indices [.kibana-event-log-7.15.1-000003] but does not have is_hidden set to true on indices [.kibana-event-log-7.15.1-000001,.kibana-event-log-7.15.1-000002]; alias must have the same is_hidden setting on all indices at org.elasticsearch.cluster.metadata.IndexAbstraction$Alias.validateAliasProperties(IndexAbstraction.java:327) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.IndexAbstraction$Alias.<init>(IndexAbstraction.java:251) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.buildIndicesLookup(Metadata.java:1936) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.build(Metadata.java:1816) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.build(Metadata.java:1696) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterState$Builder.metadata(ClusterState.java:643) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataIndexAliasesService.applyAliasActions(MetadataIndexAliasesService.java:180) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataIndexAliasesService$1.execute(MetadataIndexAliasesService.java:76) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:836) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:403) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:243) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.access$100(MasterService.java:63) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:170) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:262) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:225) [elasticsearch-7.16.2.jar:7.16.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] [2022-01-06T14:05:12,144][WARN ][r.suppressed ] [SELKS-02] path: /_aliases, params: {} java.lang.IllegalStateException: alias [.kibana-event-log-7.15.1] has is_hidden set to true on indices [.kibana-event-log-7.15.1-000002] but does not have is_hidden set to true on indices [.kibana-event-log-7.15.1-000001,.kibana-event-log-7.15.1-000003]; alias must have the same is_hidden setting on all indices at org.elasticsearch.cluster.metadata.IndexAbstraction$Alias.validateAliasProperties(IndexAbstraction.java:327) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.IndexAbstraction$Alias.<init>(IndexAbstraction.java:251) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.buildIndicesLookup(Metadata.java:1936) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.build(Metadata.java:1816) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.build(Metadata.java:1696) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterState$Builder.metadata(ClusterState.java:643) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataIndexAliasesService.applyAliasActions(MetadataIndexAliasesService.java:180) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataIndexAliasesService$1.execute(MetadataIndexAliasesService.java:76) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:836) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:403) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:243) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.access$100(MasterService.java:63) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:170) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:262) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:225) [elasticsearch-7.16.2.jar:7.16.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] [2022-01-06T14:05:12,151][WARN ][r.suppressed ] [SELKS-02] path: /_aliases, params: {} java.lang.IllegalStateException: alias [.kibana-event-log-7.15.1] has is_hidden set to true on indices [.kibana-event-log-7.15.1-000001] but does not have is_hidden set to true on indices [.kibana-event-log-7.15.1-000002,.kibana-event-log-7.15.1-000003]; alias must have the same is_hidden setting on all indices at org.elasticsearch.cluster.metadata.IndexAbstraction$Alias.validateAliasProperties(IndexAbstraction.java:327) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.IndexAbstraction$Alias.<init>(IndexAbstraction.java:251) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.buildIndicesLookup(Metadata.java:1936) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.build(Metadata.java:1816) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.Metadata$Builder.build(Metadata.java:1696) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterState$Builder.metadata(ClusterState.java:643) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataIndexAliasesService.applyAliasActions(MetadataIndexAliasesService.java:180) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.metadata.MetadataIndexAliasesService$1.execute(MetadataIndexAliasesService.java:76) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:836) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:403) ~[elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:243) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService.access$100(MasterService.java:63) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:170) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:262) [elasticsearch-7.16.2.jar:7.16.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:225) [elasticsearch-7.16.2.jar:7.16.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] [2022-01-06T14:05:12,913][INFO ][o.e.c.m.MetadataIndexTemplateService] [SELKS-02] adding index template [.kibana-event-log-7.16.2-template] for index patterns [.kibana-event-log-7.16.2-*] [2022-01-06T14:05:13,181][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [.kibana-event-log-7.16.2-000001] creating index, cause [api], templates [.kibana-event-log-7.16.2-template], shards [1]/[1] [2022-01-06T14:05:13,193][INFO ][o.e.c.r.a.AllocationService] [SELKS-02] updating number_of_replicas to [0] for indices [.kibana-event-log-7.16.2-000001] [2022-01-06T14:05:13,471][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.16.2-000001] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [kibana-event-log-policy] [2022-01-06T14:05:13,563][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.16.2-000001] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"hot","action":"unfollow","name":"branch-check-unfollow-prerequisites"}] in policy [kibana-event-log-policy] [2022-01-06T14:05:13,646][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.16.2-000001] from [{"phase":"hot","action":"unfollow","name":"branch-check-unfollow-prerequisites"}] to [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}] in policy [kibana-event-log-policy] [2022-01-06T14:05:15,469][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dns-2022.01.06/gqJwfAT6RxWUpLpCcJURwQ] update_mapping [_doc] [2022-01-06T14:05:20,231][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dns-2022.01.06/gqJwfAT6RxWUpLpCcJURwQ] update_mapping [_doc] [2022-01-06T14:05:28,249][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:05:32,255][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:05:33,251][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-tls-2022.01.06/HFMOKyRiRzKRtyM9DZ555Q] update_mapping [_doc] [2022-01-06T14:05:34,265][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-smb-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:05:34,557][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-smb-2022.01.06/4WnRu6IBSuWtSTo9kkjXCg] update_mapping [_doc] [2022-01-06T14:05:42,297][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:06:30,431][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-alert-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:06:30,754][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-alert-2022.01.06/sIpcETAhTOS2m-tN1Pi00w] update_mapping [_doc] [2022-01-06T14:06:32,460][INFO ][o.e.m.j.JvmGcMonitorService] [SELKS-02] [gc][250426] overhead, spent [585ms] collecting in the last [1.5s] [2022-01-06T14:07:45,646][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-fileinfo-2022.01.06/FrcNuthfQ0a4afXdk1wOUQ] update_mapping [_doc] [2022-01-06T14:07:45,748][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:08:39,776][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:09:31,928][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-dhcp-2022.01.06/76tYAeZKQLSzoGbhDypnqw] update_mapping [_doc] [2022-01-06T14:09:41,911][INFO ][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [.deprecation-indexing-ilm-policy] for index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] on an error step due to a transient error, moving back to the failed step [attempt-rollover] for execution. retry attempt [1125] [2022-01-06T14:09:41,912][INFO ][o.e.x.i.IndexLifecycleRunner] [SELKS-02] policy [kibana-event-log-policy] for index [.kibana-event-log-7.15.2-000001] on an error step due to a transient error, moving back to the failed step [attempt-rollover] for execution. retry attempt [1569] [2022-01-06T14:09:42,218][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [.ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002] creating index, cause [rollover_data_stream], templates [.deprecation-indexing-template], shards [1]/[1] [2022-01-06T14:09:42,414][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [.kibana-event-log-7.15.2-000002] creating index, cause [rollover_index], templates [.kibana-event-log-7.15.2-template], shards [1]/[1] [2022-01-06T14:09:42,421][INFO ][o.e.c.r.a.AllocationService] [SELKS-02] updating number_of_replicas to [0] for indices [.kibana-event-log-7.15.2-000002] [2022-01-06T14:09:42,809][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [.deprecation-indexing-ilm-policy] [2022-01-06T14:09:42,810][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] from [{"phase":"hot","action":"rollover","name":"attempt-rollover"}] to [{"phase":"hot","action":"rollover","name":"wait-for-active-shards"}] in policy [.deprecation-indexing-ilm-policy] [2022-01-06T14:09:42,811][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.15.2-000002] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [kibana-event-log-policy] [2022-01-06T14:09:42,814][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.15.2-000001] from [{"phase":"hot","action":"rollover","name":"attempt-rollover"}] to [{"phase":"hot","action":"rollover","name":"wait-for-active-shards"}] in policy [kibana-event-log-policy] [2022-01-06T14:09:42,923][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] from [{"phase":"hot","action":"rollover","name":"wait-for-active-shards"}] to [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}] in policy [.deprecation-indexing-ilm-policy] [2022-01-06T14:09:42,925][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] from [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}] to [{"phase":"hot","action":"rollover","name":"set-indexing-complete"}] in policy [.deprecation-indexing-ilm-policy] [2022-01-06T14:09:42,926][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"hot","action":"unfollow","name":"branch-check-unfollow-prerequisites"}] in policy [.deprecation-indexing-ilm-policy] [2022-01-06T14:09:42,928][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.15.2-000001] from [{"phase":"hot","action":"rollover","name":"wait-for-active-shards"}] to [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}] in policy [kibana-event-log-policy] [2022-01-06T14:09:42,929][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.15.2-000001] from [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}] to [{"phase":"hot","action":"rollover","name":"set-indexing-complete"}] in policy [kibana-event-log-policy] [2022-01-06T14:09:42,930][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.15.2-000002] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"hot","action":"unfollow","name":"branch-check-unfollow-prerequisites"}] in policy [kibana-event-log-policy] [2022-01-06T14:09:43,257][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002] from [{"phase":"hot","action":"unfollow","name":"branch-check-unfollow-prerequisites"}] to [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}] in policy [.deprecation-indexing-ilm-policy] [2022-01-06T14:09:43,259][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.15.2-000002] from [{"phase":"hot","action":"unfollow","name":"branch-check-unfollow-prerequisites"}] to [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}] in policy [kibana-event-log-policy] [2022-01-06T14:09:43,260][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001] from [{"phase":"hot","action":"rollover","name":"set-indexing-complete"}] to [{"phase":"hot","action":"complete","name":"complete"}] in policy [.deprecation-indexing-ilm-policy] [2022-01-06T14:09:43,261][INFO ][o.e.x.i.IndexLifecycleTransition] [SELKS-02] moving index [.kibana-event-log-7.15.2-000001] from [{"phase":"hot","action":"rollover","name":"set-indexing-complete"}] to [{"phase":"hot","action":"complete","name":"complete"}] in policy [kibana-event-log-policy] [2022-01-06T14:10:24,221][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-alert-2022.01.06/sIpcETAhTOS2m-tN1Pi00w] update_mapping [_doc] [2022-01-06T14:10:46,259][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-alert-2022.01.06/sIpcETAhTOS2m-tN1Pi00w] update_mapping [_doc] [2022-01-06T14:11:20,332][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:12:30,556][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-smb-2022.01.06/4WnRu6IBSuWtSTo9kkjXCg] update_mapping [_doc] [2022-01-06T14:12:30,669][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-smb-2022.01.06/4WnRu6IBSuWtSTo9kkjXCg] update_mapping [_doc] [2022-01-06T14:12:30,759][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-smb-2022.01.06/4WnRu6IBSuWtSTo9kkjXCg] update_mapping [_doc] [2022-01-06T14:12:30,851][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-smb-2022.01.06/4WnRu6IBSuWtSTo9kkjXCg] update_mapping [_doc] [2022-01-06T14:12:30,937][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-smb-2022.01.06/4WnRu6IBSuWtSTo9kkjXCg] update_mapping [_doc] [2022-01-06T14:14:58,865][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-tls-2022.01.06/HFMOKyRiRzKRtyM9DZ555Q] update_mapping [_doc] [2022-01-06T14:16:39,102][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:16:39,200][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:17:40,224][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:17:42,230][INFO ][o.e.c.m.MetadataCreateIndexService] [SELKS-02] [logstash-anomaly-2022.01.06] creating index, cause [auto(bulk api)], templates [logstash], shards [1]/[0] [2022-01-06T14:17:42,477][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc] [2022-01-06T14:17:42,486][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-anomaly-2022.01.06/KKLwo3LRRyWUmxPl6uZB9g] update_mapping [_doc] [2022-01-06T14:20:12,555][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-flow-2022.01.06/42JeBkXZTReiZgkLIY0Tdw] update_mapping [_doc] [2022-01-06T14:21:03,650][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-smb-2022.01.06/4WnRu6IBSuWtSTo9kkjXCg] update_mapping [_doc] [2022-01-06T14:23:55,042][INFO ][o.e.c.m.MetadataMappingService] [SELKS-02] [logstash-http-2022.01.06/YpwIfzH1ThqlYAv5Fn0xBw] update_mapping [_doc]

pevma commented 2 years ago

I think in this case (one node cluster) - unassigned shard is due to the fact tat it tried to restore into already closed index (from last week lest say) , which is normal due to the outage and down time.

cwwade commented 2 years ago

Your explanation seems plausible, but what procedure is recommended to resolve unassigned shards in a SELKS deployment? In other words, how do we go about fixing this?

I'm still left with the concern that the upgrade performed mid-December resulted in both of our SELKS VMs (at two different sites) immediately encountering a problem with Elasticsearch after the upgrade where there was an unassigned shard, and the shard limit had been reached on both VMs. That seems like an unlikely coincidence. This also raises concerns with future upgrades.

Thanks...

pevma commented 2 years ago

As a general rule of thumb in upon any OS upgrades that have SELKS and/or ES deployed , it is always a good practice to:

There could be a number of different reasons of getting unassigned shards and the specific reason that you experience can be visible in QA during the test upgrades and thus give you early warning of what to expect in prod or what needs fixing.

The ES documentation provides good guidance of how to find out the reason in terms of the shards:
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html

cwwade commented 2 years ago

I have continued to dig into the issues with unassigned shards in our SELKS deployments as I've had time available. I am sharing some of my observations and discoveries in case this is helpful. We continue to have the problem of unassigned shards for these systems.

To reiterate, on the system we use for QA and testing, we increased the max number of shards limit from 1,000 to 3,000. Prior to bumping this limit up, we had the following unassigned shard and its primary:

.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001 0 p STARTED 127.0.0.1 .ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001 0 r UNASSIGNED CLUSTER_RECOVERED

After increasing the shard limit, Suricata again started reporting events and the Kibana UI was again accessible. However, the number of unassigned shards increased to two (2). The relevant shard status was:

$ curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason | grep logs-deprecation % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 82459 100 82459 0 0 361k 0 --:--:-- --:--:-- --:--:-- 361k .ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001 0 p STARTED .ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001 0 r UNASSIGNED CLUSTER_RECOVERED .ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002 0 p STARTED .ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002 0 r UNASSIGNED INDEX_CREATED

In the Elasticsearch log capture that I provided above, the log sequence that starts at 2022-01-06T14:09:41,911 and runs to 2022-01-06T14:09:43,261 shows what happened after shards were no longer limited to only 1,000. In particular, the log record at 2022-01-06T14:09:43,260 indicates that the ".ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001" Index was moved from action:rollover to action:complete. However, a new ".ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002" index was started, but for some reason, its replica became unassigned. It appears that this index was left in the action:rollover state with name:check-rollover-ready.

I was able to delete the unassigned shard from 15-Dec as confirmed with this transcript:

$ curl -XDELETE 'localhost:9200/.ds-.logs-deprecation.elasticsearch-default-2021.12.15-000001/' {"acknowledged":true}

However, the new 2022.01.06-000002 unassigned shard could not be deleted:

$ curl -XDELETE 'localhost:9200/.ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002/' {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"index [.ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted"}],"type":"illegal_argument_exception","reason":"index [.ds-.logs-deprecation.elasticsearch-default-2022.01.06-000002] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted"},"status":400}

One possibility for this error is that I did not clarify that it was the replica I was trying to delete. However, I have not yet found a way to successfully remove or reset this currently unassigned shard.

Since these problematical shards are all associated with Elasticsearch deprecation logging (presumably by Kibana), I looked at the relevant elasticsearch_deprecation.log file, and noticed that these log entries occurred on 15-Dec within about 2 seconds of Elasticsearch reaching the active:running state after the restart post upgrade:

[2021-12-15T18:35:00,074][CRITICAL][o.e.d.x.s.s.SecurityStatusChangeListener] [SELKS-02] [] The default behavior of disabling security on basic licenses is deprecated. In a later version of Elasticsearch, the value of [xpack.security.enabled] will default to "true" , regardless of the license level. See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html to enable security, or explicitly disable security by setting [xpack.security.enabled] to false in elasticsearch.yml [2021-12-15T18:35:12,610][WARN ][o.e.d.c.m.IndexNameExpressionResolver] [SELKS-02] [] this request accesses system indices: [.apm-agent-configuration, .apm-custom-link, .kibana_1, .kibana_2, .kibana_3, .kibana_7.15.1_001, .kibana_7.15.2_001, .kibana_task_manager_1, .kibana_task_manager_7.15.1_001, .kibana_task_manager_7.15.2_001, .tasks], but in a future major version, direct access to system indices will be prevented by default [2021-12-15T18:36:55,541][WARN ][o.e.d.r.RestController ] [SELKS-02] [kibana] Legacy index templates are deprecated in favor of composable templates.

And then at 03:00 the next day, this log entry appeared in the deprecation log:

[2021-12-16T03:00:02,536][CRITICAL][o.e.d.i.f.SyncedFlushService] [SELKS-02] [] Synced flush is deprecated and will be removed in 8.0. Use flush at /_flush or /{index}/_flush instead.

Could this indicate some problem with normal maintenance and housekeeping? Looking further back in the deprecation log, similar log entries were made following the prior upgrade of this particular SELKS VM on 26-Nov. However, these 26-Nov log entries were categorized as "DEPRECATION" instead of "CRITICAL" and "WARN". I'm not able to determine if these deprecation log records have any significance at this time.

I also noticed that the number of log files in /var/log/elasticsearch had doubled with log files now being in plain text and json formats. The Elasticsearch release notes for Logging provides further information on this change. My question is, "could this have been one reason why it seems that the shard count increased more rapidly after the upgrade on 15-Dec?"

In summary, we are able to restore normal operational behavior to our SELKS VMs, but Elasticsearch health continues to be degraded with unassigned shards. While I will continue to dig into this (as time permits), I would appreciate suggestions for how to restore Elasticsearch health and eliminate the unassigned shards. For example, would it make sense to disable the deprecation logging for the time being? Also, given the recent changes with Eleasticsearch 7.16, should system maintenance procedures be revised or performed on an accelerated schedule? Since Stamus has provided a packaged implementation with Suricata and Elasticsearch, we want to maintain consistency with the SELKS deployment model.

Thanks...

pevma commented 2 years ago

Did you manage to fix this ES problem?