Closed ulysse31 closed 3 months ago
UPDATE:
I was thinking that it may be related to bonding ... But it seems that it does also segmentation fault on the other "interface direct" server :
[Fri Jul 26 06:24:41 2024] W#09-eno2np1[3532764]: segfault at 0 ip 00000000009349a9 sp 00007f1f5fffc270 error 4 in suricata[4d4000+637000] likely on CPU 4 (core 1, socket 0) [Fri Jul 26 06:24:41 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa
This one is New york time zone (the other one is Paris timezone) So it segmentation fault on both ... but the big difference is potentially on the bandwidth: one is a single 10Gbps interface, the other one is a bonding of 2 10Gbps interface, because of the traffic volume. So, to reformulate, the latest version of docker suricata, seems to segmentation fault on High traffic (average 20MBytes/s on bond1) The other one in New York is right now around 2/3Mbytes/s (low activity / early morning)
UPDATE2:
Confirmed after traffic waking up in New York ...
[Fri Jul 26 08:10:34 2024] W#31-eno2np1[3671915]: segfault at 0 ip 00000000009349a9 sp 00007f31ad4f1270 error 4 in suricata[4d4000+637000] likely on CPU 6 (core 6, socket 0) [Fri Jul 26 08:10:34 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa
Seems that docker suricata no longer support high traffic and crashes on high traffic ...
How often does this happen?
What is the output of :
docker exec suricata suricata --build-info
UPDATE3:
Updated the title, since I can now confirm that the segmentation fault / crash appear starting from a certain traffic activity on both of my test systems ... I've tried master-amd64, master-profiling, master ... they all do the same segmentation fault crash loop on high traffic ...
How often does this happen? What is the output of :
docker exec suricata suricata --build-info
Hello ! Thanks for your reply, here is the output :
This is Suricata version 8.0.0-dev (7f6c963ac 2024-07-20) Features: NFQ PCAP_SET_BUFF AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG LIBNET1.1 HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HTTP2_DECOMPRESSION HAVE_LUA HAVE_JA3 HAVE_JA4 HAVE_LIBJANSSON TLS TLS_C11 MAGIC RUST POPCNT64 SIMD support: SSE_4_2 SSE_4_1 SSE_3 SSE_2 Atomic intrinsics: 1 2 4 8 16 byte(s) 64-bits, Little-endian architecture GCC version 11.4.1 20231218 (Red Hat 11.4.1-3), C version 201112 compiled with _FORTIFY_SOURCE=0 L1 cache line size (CLS)=64 thread local storage method: _Thread_local compiled with LibHTP v0.5.48, linked against LibHTP v0.5.48
Suricata Configuration: AF_PACKET support: yes AF_XDP support: no DPDK support: yes eBPF support: yes XDP support: yes PF_RING support: no NFQueue support: yes NFLOG support: no IPFW support: no Netmap support: no DAG enabled: no Napatech enabled: no WinDivert enabled: no
Unix socket enabled: yes Detection enabled: yes
Libmagic support: yes libjansson support: yes hiredis support: yes hiredis async with libevent: yes PCRE jit: yes GeoIP2 support: yes JA3 support: yes JA4 support: yes Non-bundled htp: no Hyperscan support: yes Libnet support: yes liblz4 support: yes Landlock support: yes Systemd support: yes
Rust support: yes Rust strict mode: no Rust compiler path: /usr/bin/rustc Rust compiler version: rustc 1.75.0 (82e1608df 2023-12-21) (Red Hat 1.75.0-1.el9) Cargo path: /usr/bin/cargo Cargo version: cargo 1.75.0
Python support: yes Python path: /usr/bin/python3 Install suricatactl: yes Install suricatasc: yes Install suricata-update: yes
Profiling enabled: no Profiling locks enabled: no Profiling rules enabled: no
Plugin support (experimental): yes DPDK Bond PMD: no
Development settings: Coccinelle / spatch: no Unit tests enabled: no Debug output enabled: no Debug validation enabled: no Fuzz targets enabled: no
Generic build parameters: Installation prefix: /usr Configuration directory: /etc/suricata/ Log directory: /var/log/suricata/
--prefix /usr --sysconfdir /etc --localstatedir /var --datarootdir /usr/share
Host: x86_64-pc-linux-gnu Compiler: gcc (exec name) / g++ (real) GCC Protect enabled: no GCC march native enabled: no GCC Profile enabled: no Position Independent Executable enabled: no CFLAGS -g -O2 -fPIC -std=c11 -I/usr/include/dpdk -include rte_config.h -march=corei7 -mrtm -I${srcdir}/../rust/gen -I${srcdir}/../rust/dist -I../rust/gen PCAP_CFLAGS SECCFLAGS
Has I said earlier, from their github, the "master-amd64" branch used is the latest available code on their latest github code branch ... and I would suspect the code is actually broken ...
How often does this happen? What is the output of :
docker exec suricata suricata --build-info
Oh and for the frequency => it depends on the traffic amount :
Hope this helps.
You can switch tot he latest Suricata build like so:
So the only change you need to make is master
->latest
on the line - https://github.com/StamusNetworks/SELKS/blob/master/docker/compose.yml#L107
Then update the dockers like so:
https://github.com/StamusNetworks/SELKS/wiki/Docker#upgrade-all-containers
You can switch tot he latest Suricata build like so:
So the only change you need to make is
master
->latest
on the line - https://github.com/StamusNetworks/SELKS/blob/master/docker/compose.yml#L107 Then update the dockers like so: https://github.com/StamusNetworks/SELKS/wiki/Docker#upgrade-all-containers
Hello,
Yes, now it does work again, but please also specify two important details : 1- You must edit suricata.yaml and replace all occurences of MiB with mB (or mb) because sizing syntax between suricata 8.0 dev (master) and 7.0 (latest) is different, otherwise it will crash with an error at startup 2- This causes the actual install documentation & install procedure to be broken => we must edit compose.yml and suricata.yaml to "patch" ourselves manually the actual setup.
Other than that ... we are good ^^'
Side note : if the suricata version 8.0 dev is NOT necessary to run SELKS, why is the current install using a "dev" version prone to those kind of code error / crash issues ? why not use latest always ? shouldn't the documentation and the actual setup script and config be corrected to use it ? isn't it a better stable solution ? Thanks for your answer.
For reference so we can chase it down - Can you give some examples of where exactly in the config you needed to edit the MiB
occurrences that are needed to be edited?
Aso , you should probably have acore
file inside the suricata docker - is this the case?
You can use find inside the docker to see if one exists so we can try to trace the reason for the segfault.
Hello,
The MiB unit is not understood by suricata latest (7.0), so all mentions on the suricata.yaml using it is incorrect. You can try it yourself by modifying the compose.yml and use latest instead of master-amd64. When using suricata latest, the config error does not generate a segmentation fault, but a fatal error that makes suricata quit, since the docker container is configured to start again when process quits, it boot loop. The segmentation fault happens on high traffic with docker image "master-amd64", which again, is suricata 8.0 dev version (current compiled code from github) I also indicated this error to suricata docker github, an issue is also opened about this. Again, since the master-amd64 is the dev code, it does not surprises me that " sometimes" the compiled code gets buggy. The segmentation fault seems clearly to be linked with traffic amount: Since I had one of the two servers crashing at first, I though it was related to one using a bond interface, but after waiting that activity goes up on the one that wasn't crashed at first, he then also crashed as well with the load ... I'll look into the folder to see if i still have some core somewhere (i cleaned up after getting it working again) Anyways, i really would like to know why SELKS uses the dev / unstable version of suricata, instead of the latest / stable version ? Thanks
Le sam. 27 juil. 2024, 19:40, Peter Manev @.***> a écrit :
For reference so we can chase it down - Can you give some examples of where exactly in the config you needed to edit the MiB occurrences that are needed to be edited?
Aso , you should probably have acore file inside the suricata docker - is this the case? You can use find inside the docker to see if one exists so we can try to trace the reason for the segfault.
— Reply to this email directly, view it on GitHub https://github.com/StamusNetworks/SELKS/issues/475#issuecomment-2254224118, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKSDLSDYVAOLUKTHPIGO2TZOPSQFAVCNFSM6AAAAABLQFSEC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGIZDIMJRHA . You are receiving this because you authored the thread.Message ID: @.***>
Aha understood - you mean the original suricata.yaml had mib
in it and with latest Suricata it complains about it.
Ok but i would expect the original suricata.yaml to be replaced by the new one latest
during the docker pull/update - did that not happen ?
SELKS has always (for about 10 years now) , by default used latest Suricata, to showcase the newest features and latest additions. Since it is docker based it is very easy to switch away to any Suricata version desired though. This is the first segfault that i remember is reported. That's why the question if you could find the core file?
Thanks for reporting it !
As you said "using the latest" ... But in fact its not using the latest stable ^^' its using the dev unstable version And no it did not replaced it, you have to keep in mind that the suricata.yaml is NOT inside de container ^^ So following the upgrade instructions with the down the pull and up -d ... won't change it ^^' UPDATE: hmm ... it did replaced the suricata.yml on my second node I updated today ... strange.
So my question do still stands ^^' why is SELKS using the dev unstable branch of suricata and not the latest stable (7.0)? EDIT: let me try to reformulate => which feature is such crucial and only available on 8.0 dev that worth the risk to use the dev unstable instead of the latest stable ?
Just FYI: before we discuss about this, I contacted Jason Ish from Suricata, in order to signal him that the master-amd64 docker version was making segmentation fault on "high" traffic (on 20Mbyte/s to up), he didn't seem surprised at all, and told me that the master branch is the dev version of suricata, it takes the latest code version, which may contain unstable code, and makes an image of it. He also told me that correcting an issue on it would take some days ... SELKS is really a great project, but I'm just worried that it uses by default unstable code, on a project aimed to be used in production ...
UPDATE: seems that suricata docker master branch image (8.0 dev) was updated 11 hours ago ... maybe the issue is now fixed ? ^^'
Yes, SELKS is running the latest master/dev as mentioned here: https://github.com/StamusNetworks/SELKS/issues/475#issuecomment-2253757117 We will change it as you noted however as it was left over from previous SELKS deployments/versions.
Thanks a lot ! I'm trying my best to implement SELKS here, and potentially thinking maybe to see a pro version later on if people get convinced ^^' I suppose we can assume everything is OK now. Have a great day & week ^^
Thanks! Glad it worked out !
Is there an existing issue for this?
Current Behavior
After updating two SELKS nodes, seems that the latest suricata docker version is crash looping with a segmentation fault, depending on the traffic amount to analyze I tried wiping suricata container and its related data did not help (suricata container still crash loop). I then wiped all SELKS containers and their data, did not help (suricata container still crash loop).
on the dmesg of the host I get this :
[Fri Jul 26 10:44:31 2024] W#06-bond1[78735]: segfault at 0 ip 00000000009349a9 sp 00007f853fffc270 error 4 in suricata[4d4000+637000] likely on CPU 22 (core 14, socket 0) [Fri Jul 26 10:44:31 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa
In loop ...
and the last lines of the container before starting again are :
Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598] Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598] Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598] Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598] Notice: threads: Threads created -> W: 64 FM: 1 FR: 1 Engine started. [TmThreadWaitOnThreadRunning:tm-threads.c:1905]
UPDATE: after digging, seems that both of my SELKS instance have the dockerized suricata version that crashes, one crashes almost every minute because of the amount of traffic to analyze ... Also after digging, seems that SELKS project is using the "master-amd64" image ... which, following suricata docker github, is the "latest code dev version available" ... which does not seem particularly a "wise" choice for stability ? If you have any idea on any potential debug command that would give me more hints ... I would really appreciate it ^^' Thanks a lot
Expected Behavior
After upgrade have a suricata container that works ...
Steps To Reproduce
Docker version
Docker version 27.0.3, build 7d4bcd8
Docker version
Docker Compose version v2.28.1
OS Version
Debian GNU/Linux 12 (bookworm)
Content of the environnement File
COMPOSE_PROJECT_NAME=selks INTERFACES= -i bond1 ELASTIC_MEMORY=64G SCIRIUS_SECRET_KEY=
Version of SELKS
commit 4af455cd15f69f2ba471fa6cd0b96d6aae6e93b9 (HEAD -> master, origin/master, origin/HEAD) Author: Peter Manev pmanev@stamus-networks.com Date: Thu Jun 13 13:18:18 2024 +0200
Anything else?
As always, thanks for your help ^^'