Security-Onion-Solutions / security-onion

Security Onion 16.04 - Linux distro for threat hunting, enterprise security monitoring, and log management
https://securityonion.net
3.06k stars 521 forks source link

so-elastic-start times out waiting for elasticsearch #1695

Closed petiepooo closed 4 years ago

petiepooo commented 4 years ago

so-kibana-start has a hardcoded timeout of 240 seconds waiting for elasticsearch to start (not the wait for the .kibana shard addressed in #1655 but the wait prior to that for elasticsearch itself to respond). On resource-constrained systems, generally during boot, it takes longer than that for elasticsearch to start, as it has to share CPU resources with multiple snort and barnyard2 processes being initialized. In such cases, when elasticsearch has not come online within 240 seconds, the remaining elastic services are not started. Most noticeably, kibana, but also elastalert, logstash, and curator. Would it be acceptable to modify the so-kibana-start script to try reading the timeout from /etc/nsm/securityonion.conf and, only if not found there, default to 240?

petiepooo commented 4 years ago

In the interim, I've doubled the time simply by changing sleep 1 to sleep 2 within that loop... it was easy for salt to do an in-place edit since there is only one call to sleep in the script. :)

dougburks commented 4 years ago

Hi @petiepooo ,

Sounds like a good idea. I've made the new default 480 seconds and you can now change that default value by setting ELASTICSEARCH_TIMEOUT in /etc/nsm/securityonion.conf. I've implemented this for so-kibana-start and so-elasticsearch-pipelines as well since it has a similar timeout. Please take a look at https://github.com/Security-Onion-Solutions/securityonion-elastic/commit/e9d3421c01d224e7d13ccae718a79bc16048a75b and let me know what you think.

Thanks!

petiepooo commented 4 years ago

I believe that should work very well. Thank you for your quick response!

petiepooo commented 4 years ago

So.. another reboot, another timeout. Now I see that the so-boot process is being killed by systemd due to the timeout setting in /etc/systemd/system/securityonion.service. Since that also includes the time needed to launch the squild server and sensor components, perhaps that could be increased from 300 to 600 as well?

dougburks commented 4 years ago

Created issue 1708 to increase the timeout in /etc/systemd/system/securityonion.service: https://github.com/Security-Onion-Solutions/security-onion/issues/1708

weslambert commented 4 years ago

Looks good so far from my testing 👍

dougburks commented 4 years ago

Published: https://blog.securityonion.net/2020/02/zeek-301-elastic-686-and-cyberchef-9120.html