Open eglyn opened 3 years ago
Do you use Moloch ?
Can you paste the full output of selks-health-check_stamus
?
Full log:
suricata.service - LSB: Next Generation IDS/IPS
Loaded: loaded (/etc/init.d/suricata; generated)
Active: active (running) since Thu 2021-08-19 13:38:43 CEST; 50min ago
Docs: man:systemd-sysv-generator(8)
Process: 5356 ExecStart=/etc/init.d/suricata start (code=exited, status=0/SUCCESS)
Tasks: 14 (limit: 4915)
Memory: 2.5G
CGroup: /system.slice/suricata.service
└─5363 /usr/bin/suricata -c /etc/suricata/suricata.yaml --pidfile /var/run/suricata.pid --af-packet -D -v --user=logstash
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● elasticsearch.service - Elasticsearch
Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-08-19 13:34:37 CEST; 55min ago
Docs: https://www.elastic.co
Main PID: 4714 (java)
Tasks: 125 (limit: 4915)
Memory: 37.1G
CGroup: /system.slice/elasticsearch.service
├─4714 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile…
└─4915 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● logstash.service - logstash
Loaded: loaded (/etc/systemd/system/logstash.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-08-16 09:14:22 CEST; 3 days ago
Main PID: 512 (java)
Tasks: 56 (limit: 4915)
Memory: 1.8G
CGroup: /system.slice/logstash.service
└─512 /usr/share/logstash/jdk/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,601][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,601][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,601][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,601][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
Hint: Some lines were ellipsized, use -l to show in full.
● kibana.service - Kibana
Loaded: loaded (/etc/systemd/system/kibana.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-08-19 13:34:37 CEST; 55min ago
Docs: https://www.elastic.co
Main PID: 5039 (node)
Tasks: 18 (limit: 4915)
Memory: 439.3M
CGroup: /system.slice/kibana.service
├─5039 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli/dist --logging.dest=/var/log/kibana/kibana.log --pid.file=/run/kibana/kibana.pid
└─5075 /usr/share/kibana/node/bin/node --preserve-symlinks-main --preserve-symlinks /usr/share/kibana/src/cli/dist --logging.dest=/var/log/kibana/kibana.log --pid.file=/run/kibana/kibana.pid
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● evebox.service - EveBox Server
Loaded: loaded (/lib/systemd/system/evebox.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-08-16 09:14:22 CEST; 3 days ago
Main PID: 511 (evebox)
Tasks: 9 (limit: 4915)
Memory: 5.8M
CGroup: /system.slice/evebox.service
└─511 /usr/bin/evebox server
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● molochviewer-selks.service - Moloch Viewer
Loaded: loaded (/etc/systemd/system/molochviewer-selks.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2021-08-19 13:43:39 CEST; 46min ago
Process: 5540 ExecStart=/bin/sh -c /data/moloch/bin/node viewer.js -c /data/moloch/etc/config.ini >> /data/moloch/logs/viewer.log 2>&1 (code=exited, status=1/FAILURE)
Main PID: 5540 (code=exited, status=1/FAILURE)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● molochpcapread-selks.service - Moloch Pcap Read
Loaded: loaded (/etc/systemd/system/molochpcapread-selks.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2021-08-19 13:43:38 CEST; 46min ago
Process: 5537 ExecStart=/bin/sh -c /data/moloch/bin/moloch-capture -c /data/moloch/etc/config.ini -m --copy --delete -R /data/nsm/ >> /data/moloch/logs/capture.log 2>&1 (code=exited, status=1/FAILURE)
Main PID: 5537 (code=exited, status=1/FAILURE)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
scirius RUNNING pid 5082, uptime 0:55:02
ii elasticsearch 7.13.4 amd64 Distributed RESTful search engine built for the cloud
ii elasticsearch-curator 5.8.4 amd64 Have indices in Elasticsearch? This is the tool for you!\n\nLike a museum curator manages the exhibits and collections on display, \nElasticsearch Curator helps you curate, or manage your indices.
ii evebox 1:0.14.0 amd64 no description given
ii kibana 7.13.4 amd64 Explore and visualize your Elasticsearch data
ii kibana-dashboards-stamus 2020122001 amd64 Kibana 6 dashboard templates.
ii logstash 1:7.13.4-1 amd64 An extensible logging pipeline
ii moloch 3.0.0-1 amd64 Moloch Full Packet System
ii scirius 3.5.0-3 amd64 Django application to manage Suricata ruleset
ii suricata 1:2021052601-0stamus0 amd64 Suricata open source multi-thread IDS/IPS/NSM system.
Sys. de fichiers Type Taille Utilisé Dispo Uti% Monté sur
udev devtmpfs 32G 0 32G 0% /dev
tmpfs tmpfs 6,3G 591M 5,7G 10% /run
/dev/md1 ext4 1,8T 829G 911G 48% /
tmpfs tmpfs 32G 0 32G 0% /dev/shm
tmpfs tmpfs 5,0M 0 5,0M 0% /run/lock
tmpfs tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/md0 ext4 463M 81M 354M 19% /boot
tmpfs tmpfs 6,3G 0 6,3G 0% /run/user/1000
On Selks after some days: (empty)
And on Moloch URL I have: MaxRetryError at /moloch/ HTTPConnectionPool(host='localhost', port=8005): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3c78854c50>: Failed to establish a new connection: [Errno 111] Connection refused',))
Just to double check - Did the first time setup finished without a problem? (https://github.com/StamusNetworks/SELKS/wiki/First-time-setup)
Also noticed you could upgrade (post QA test :) ) (https://github.com/StamusNetworks/SELKS/wiki/SELKS-upgrades)
It could also be related to disk filing up ?
Yes the first time setup finished great, Selks works great for some days before crashing. I do all update, it update some app and packets, but same issue.
The disk is not full, but it reached the limit of the moloch config (setup in config.ini), but there is a logrotate I suppose ^^
I have this on eleastic search info:
Maybe it is an issue with suricata, It stuck on "Fetching data":
If it does this once every 2 days or so - it can help to do a health check when it actually happens - could be easier to troubleshoot. Did you do an upgrade ?
Yes I upgrade it, no change. It actually happens now ^^ but health check just show 2 moloch services down.
From the report it seems you have 3.5.0-3
running , the current stable is 3.7.0-6
, hence my note about upgrading.
Just noticed too that you are running the latest Moloch (3.0
) so might be some errs in the logs, might be related to that upgrade path.
From the report it seems you have
3.5.0-3
running , the current stable is3.7.0-6
, hence my note about upgrading.
That's weird, I already launched the update with sudo selks-upgrade_stamus.
And it stays at 3.5.0-3 :/
What is the output of:
cat /etc/apt/sources.list.d/selks5.list
What is the output of:
cat /etc/apt/sources.list.d/selks5.list
I does not have any selks5, but a selks6.list:
deb http://packages.stamus-networks.com/selks6/debian/ buster main
deb http://packages.stamus-networks.com/selks6/debian-kernel/ buster main
deb http://packages.stamus-networks.com/selks6/debian-test/ buster main
Just noticed too that you are running the latest Moloch (
3.0
) so might be some errs in the logs, might be related to that upgrade path.
I have this errors in viewer.log:
"rest_total_hits_as_int": true
} err: ResponseError: index_not_found_exception
at onBody (/data/moloch/node_modules/@elastic/elasticsearch/lib/Transport.js:311:23)
at IncomingMessage.onEnd (/data/moloch/node_modules/@elastic/elasticsearch/lib/Transport.js:240:11)
at IncomingMessage.emit (events.js:412:35)
at endReadableNT (internal/streams/readable.js:1317:12)
at processTicksAndRejections (internal/process/task_queues.js:82:21) {
meta: {
body: { error: [Object], status: 404 },
statusCode: 404,
And in the capture.log:
ug 20 09:19:41 http.c:306 moloch_http_send_sync(): 1/1 SYNC 404 http://localhost:9200/_template/arkime_sessions3_template?filter_path=**._meta 0/2 0ms 2ms
Aug 20 09:19:41 db.c:2054 moloch_db_check(): ERROR - Couldn't load version information, database might be down or out of date. Run "db/db.pl host:port upgrade"
Aug 20 09:21:11 main.c:202 parse_args(): WARNING: gethostname doesn't return a fully qualified name and getdomainname failed, this may cause issues when viewing pcaps, use the --host option - SERVERNAME
If I launch stamus upgrade I have:
NOTE:
Depending on the size and how busy the system is the upgrade may take a while.
Starting the upgrade sequence...
Atteint :1 http://security.debian.org/debian-security buster/updates InRelease
Atteint :2 https://artifacts.elastic.co/packages/7.x/apt stable InRelease
Atteint :3 http://packages.stamus-networks.com/selks6/debian buster InRelease
Atteint :5 https://packages.elastic.co/curator/5/debian9 stable InRelease
Atteint :6 http://packages.stamus-networks.com/selks6/debian-kernel buster InRelease
Atteint :7 http://packages.stamus-networks.com/selks6/debian-test buster InRelease
Atteint :4 https://files.evebox.org/evebox/debian stable InRelease
Lecture des listes de paquets... Fait
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances
Lecture des informations d'état... Fait
selks-scripts-stamus est déjà la version la plus récente (2020121401).
0 mis à jour, 0 nouvellement installés, 0 à enlever et 1 non mis à jour.
NOTE:
Starting second stage upgrade sequence...
outputs.7.pcap-log.enabled = yes
Atteint :1 http://security.debian.org/debian-security buster/updates InRelease
Atteint :2 https://artifacts.elastic.co/packages/7.x/apt stable InRelease
Atteint :3 http://packages.stamus-networks.com/selks6/debian buster InRelease
Atteint :5 https://packages.elastic.co/curator/5/debian9 stable InRelease
Atteint :6 http://packages.stamus-networks.com/selks6/debian-kernel buster InRelease
Atteint :7 http://packages.stamus-networks.com/selks6/debian-test buster InRelease
Atteint :4 https://files.evebox.org/evebox/debian stable InRelease
Lecture des listes de paquets... Fait
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances
Lecture des informations d'état... Fait
Calcul de la mise à jour... Fait
0 mis à jour, 0 nouvellement installés, 0 à enlever et 0 non mis à jour.
scirius: stopped
scirius: started
And it stays at 3.0.5-3
If I check with apt list --upgradable I have:
scirius/inconnu,inconnu 3.7.0-6 amd64 [pouvant être mis à jour depuis : 3.5.0-3]
But If I try to upgrade with apt I have (without validate) I have:
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances
Lecture des informations d'état... Fait
Calcul de la mise à jour... Fait
0 mis à jour, 0 nouvellement installés, 0 à enlever et 0 non mis à jour.
Hello, did you do an apt upgrade
or an apt dist-upgrade
?
Hello, did you do an
apt upgrade
or anapt dist-upgrade
?
No, I only use selks-upgrade_stamus
If I try to go to /kibana url I have: HTTPConnectionPool(host='localhost', port=5601): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3c205fc2b0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Can you try apt-get upgrade
only ?
Can you try
apt-get upgrade
only ?
I success to upgrade scirius to 3.7.0-6, i have to change my source.list config, and it works with selks-upgrade_stamus.
But it change nothing, molochpcapread-selks.service does not start, kibana still have the error above and on suricata management webpage, everything is empty :/
When I launch this command:
/data/moloch/bin/moloch-capture -c /data/moloch/etc/config.ini -m --copy --delete -R /data/nsm/
I have this error:
ERROR - Couldn't load version information, database might be down or out of date. Run "db/db.pl host:port upgrade"
I try : db/db.pl host:port upgrade
And it says:
Couldn't PUT http://SERVER:9200/arkime_sequence_v30/_mapping?master_timeout=240s the http status code is 404 are you sure elasticsearch is running/reachable?
Elasticsearch is running:
systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-08-20 10:43:21 CEST; 20min ago
Docs: https://www.elastic.co
Main PID: 756 (java)
Tasks: 136 (limit: 4915)
Memory: 38.4G
CGroup: /system.slice/elasticsearch.service
├─ 756 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.
└─1116 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
And dp.pl 127.0.01:9200 info:
./db.pl 127.0.0.1:9200 info
Cluster Name: elasticsearch
ES Version: 7.14.0
DB Version: 66
ES Data Nodes: 1/1
Sessions2 Indices: 0
Sessions: 0 (0 bytes)
History Indices: 0
Histories: 0 (0 bytes)
stats_v4: 1 (37,157 bytes)
fields_v3: 327 (71,845 bytes)
files_v6: 200 (75,228 bytes)
users_v7: 2 (8,326 bytes)
hunts_v2: 0 (301 bytes)
dstats_v4: 4,320 (2,559,621 bytes)
sequence_v3: 1 (4,304 bytes)
Looks like you have an HTML coming back instead of a JSON in the last test. Do you need to specify the port ?
Looks like you have an HTML coming back instead of a JSON in the last test. Do you need to specify the port ?
You speak about db.pl 127.0.0.1:9200 upgrade ?
I think I have to put the port, it is the port of Elasticsearch, if I don't put the port, I have directly an error.
Moloch is looking for http://SERVER:9200/arkime_sequence_v30/. Why is it looking for an index named arkime_sequence_v30 ?
Ok, Moloch works, I have to do a db.pl 127.0.0.1 init...
And, I found another issue with kibana and elasticsearch, I was stuck to 1000 shards:
Please check the health of your Elasticsearch cluster and try again. Error: [validation_exception]: Validation Failed: 1: this action would add [2] shards, but this cluster currently has [1000]/[1000] maximum normal shards open
I increase max shard to 5000, and everything works, but is there a way to not reproduce the issue ? (stuck at 5000...)
What size of data/volume do you have? Is it still one node cluster?
What size of data/volume do you have? Is it still one node cluster?
Disk is a 2 TB raid 1 SSD, full at 90%.
I have setup the moloch config.ini to 10% space left, 10GB max file size and 30min.
and yes I have only one node.
I that case I think ES hits the watermark i suspect - full disk ? (/avr/log/elasticsearch/elasticsearch.log
)
https://stackoverflow.com/questions/50609417/elasticsearch-error-cluster-block-exception-forbidden-12-index-read-only-all
If that is the case it means you generate more data fast and might need to lower the retention or use a bigger disk.
I that case I think ES hits the watermark i suspect - full disk ? (
/avr/log/elasticsearch/elasticsearch.log
) https://stackoverflow.com/questions/50609417/elasticsearch-error-cluster-block-exception-forbidden-12-index-read-only-allIf that is the case it means you generate more data fast and might need to lower the retention or use a bigger disk.
I don't understand something with disk retention...
I set: maxFileSizeG = 1 maxFileTimeM = 30 freeSpaceG = 50%
But disk still saturate, 82% now.... limits does not works ? Is there another parameter to limit disk usage ?
/data folder is about 1.2TB for a 1.8TB disk. and there is 172GB in /var folder.
Where do you setup those settings ? Is it during setup for the pcap retention - in that case it is not for ES.
In the wiki, file /data/moloch/etc/config.ini
I don't think it is elasticsearch wich use all disk space, but moloch directory (1.2TB)
Yes, this is elasticsearch reaching the watermark (80% by default) and thus switching to readonly. So maybe ES is writing in a diff volume/disk?
Hi all,
I have a dedicated server running selks, and everything works great except after some days, there is no data on all dashboards :/ When I check the health status I have 2 services down:
Here the complete log:
If I reboot the server, everything come to normal for few days.
Any ideas ?