journald core dump on 3227.2.1 and 3227.2.2

whites11 commented 2 years ago

Description

On a newly deployed VM on Azure or AWS, journald-related units are failing:

# systemctl list-units --failed
  UNIT                            LOAD   ACTIVE SUB    DESCRIPTION              
● systemd-journald.service        loaded failed failed Journal Service          
● systemd-journald-audit.socket   loaded failed failed Journal Audit Socket
● systemd-journald-dev-log.socket loaded failed failed Journal Socket (/dev/log)
● systemd-journald.socket         loaded failed failed Journal Socket

Logs talk about a core dump:

# journalctl -u systemd-journald.service
...
Aug 29 13:34:27 worker-00003K systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
Aug 29 13:34:27 worker-00003K systemd[1]: systemd-journald.service: Failed with result 'core-dump'.

This started happening when we upgraded to aforementioned flatcar versions. We did some changes to the ignition config together with the version bump, but nothing related to journald settings.

Impact

Nothing major as far as we can tell.

Environment and steps to reproduce

Can't really reproduce in a lab setting. This does not seem to happen on a vanilla flatcar VM, but it happens systematically when we use our ignition config to run k8s on top of it.

Expected behavior

journald not to core dump

Additional information

Dmesg:

[   44.689183] systemd[1]: Starting Journal Service...
[   44.715709] systemd-journald[1588]: File /var/log/journal/e64de388fdf943e58b260fc9c4fd4b6e/system.journal corrupted or uncleanly shut down, renaming and replacing.
[   44.761049] systemd-journald[1588]: Assertion 'n + N_IOVEC_META_FIELDS + (pid_is_valid(object_pid) ? N_IOVEC_OBJECT_FIELDS : 0) + client_context_extra_fields_n_iovec(c) <= m' failed at src/journal/journald-server.c:923, function dispatch_message_real(). Aborting.
[   44.830550] systemd-coredump[1649]: elfutils disabled, parsing ELF objects not supported
[   44.841361] systemd-coredump[1649]: Process 1588 (systemd-journal) of user 0 dumped core.
[   44.852691] systemd-coredump[1649]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.9281eeb07ce84c18894fe3e81ab04fce.1588.1662564835000000.zst
[   44.873681] systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
[   44.884875] systemd[1]: systemd-journald.service: Failed with result 'core-dump'.
[   44.896113] systemd[1]: Failed to start Journal Service.

Core dump:

core.systemd-journal.gz

jepio commented 2 years ago

Would you share your ignition config with sanitized secrets so that we can reproduce this?

whites11 commented 2 years ago

I found out that if I disabled kubelet from the node, journald didn't crash. So I tried disable the pods one by one until I found the offending one. I think this is caused by Falco.

fun fact: we didn't change anything on falco compared to previous release, this means this happened either because the kernel was upgraded or kubernetes was upgraded

whites11 commented 2 years ago

We verified that this happens with upstream falco chart as well, without any modification.

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm install falco falcosecurity/falco --namespace falco --create-namespace

As soon as the falco pod starts, journald crashes. As soon as the falco pod gets deleted, journald recovers.

leogr commented 2 years ago

PTAL :point_right: https://github.com/falcosecurity/falco/issues/2196#issuecomment-1241258603

jepio commented 2 years ago

So this is the backtrace:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007f5bf391a538 in __GI_abort () at abort.c:79
#2  0x00007f5bf3b7b7e8 in log_assert_failed (text=text@entry=0x5642198119c8 "n + N_IOVEC_META_FIELDS + (pid_is_valid(object_pid) ? N_IOVEC_OBJECT_FIELDS : 0) + client_context_extra_fields_n_iovec(c) <= m", file=file@entry=0x5642198107d8 "src/journal/journald-server.c", line=line@entry=923, func=func@entry=0x5642198133e0 <__PRETTY_FUNCTION__.84> "dispatch_message_real") at ../systemd-stable-250.3/src/basic/log.c:866
#3  0x00005642197fe967 in dispatch_message_real (s=s@entry=0x7ffe9b3ce700, iovec=iovec@entry=0x7ffe9b3cdcc0, n=<optimized out>, m=m@entry=95, c=c@entry=0x0, tv=tv@entry=0x0, priority=5, object_pid=0) at ../systemd-stable-250.3/src/journal/journald-server.c:923
#4  0x0000564219801279 in server_dispatch_message (s=s@entry=0x7ffe9b3ce700, iovec=iovec@entry=0x7ffe9b3cdcc0, n=<optimized out>, m=m@entry=95, c=c@entry=0x0, tv=tv@entry=0x0, priority=<optimized out>, object_pid=<optimized out>) at ../systemd-stable-250.3/src/journal/journald-server.c:1122
#5  0x0000564219809b9d in process_audit_string (s=0x7ffe9b3ce700, type=<optimized out>, data=<optimized out>, size=<optimized out>) at ../systemd-stable-250.3/src/journal/journald-audit.c:404
#6  0x0000564219809d3f in server_process_audit_message (s=s@entry=0x7ffe9b3ce700, buffer=<optimized out>, buffer_size=buffer_size@entry=6260, ucred=<optimized out>, sa=sa@entry=0x7ffe9b3ce3f0, salen=<optimized out>) at ../systemd-stable-250.3/src/journal/journald-audit.c:457
#7  0x00005642198017a6 in server_process_datagram (es=<optimized out>, fd=5, revents=<optimized out>, userdata=0x7ffe9b3ce700) at ../systemd-stable-250.3/src/journal/journald-server.c:1379
#8  0x00007f5bf3d2a050 in source_dispatch (s=s@entry=0x56421a555af0) at ../systemd-stable-250.3/src/libsystemd/sd-event/sd-event.c:3598
#9  0x00007f5bf3d2a34d in dispatch_exit (e=<optimized out>) at ../systemd-stable-250.3/src/libsystemd/sd-event/sd-event.c:4183
#10 sd_event_dispatch (e=e@entry=0x56421a5535e0) at ../systemd-stable-250.3/src/libsystemd/sd-event/sd-event.c:4176
#11 0x00007f5bf3d2aa90 in sd_event_run (e=0x56421a5535e0, timeout=18446744073709551615) at ../systemd-stable-250.3/src/libsystemd/sd-event/sd-event.c:4244
#12 0x00005642197f84ad in main (argc=<optimized out>, argv=<optimized out>) at ../systemd-stable-250.3/src/journal/journald.c:114

This Dockerfile creates an environment for analyzing the coredump:

FROM mediadepot/flatcar-developer:3227.2.2

RUN echo 'FEATURES="-ipc-sandbox -network-sandbox -pid-sandbox"' >>/etc/portage/make.conf
RUN emerge --info 2>&1 | grep ^INSTALL_MASK | sed 's|/usr/lib/debug||g' >> /etc/portage/make.conf
RUN emerge-gitclone
RUN emerge -gK gdb systemd glibc
RUN emerge --fetchonly --nodeps systemd
RUN tar xf /var/lib/portage/distfiles/systemd-stable*.tar.gz
RUN wget https://github.com/flatcar-linux/Flatcar/files/9507659/core.systemd-journal.gz && gunzip core.systemd-journal.gz

CMD ["/usr/bin/gdb", "/usr/lib/systemd/systemd-journald", "core.systemd-journal"]
# inside gdb, run: set sysroot /

Might be worth reporting to systemd? Although looking at changes between the flatcar releases, this might end up being attributed to a kernel regression.

whites11 commented 2 years ago

I started from a vanilla flatcar, installed k8s on top of it using this, then run falco through the helm chart. Unfortunately I still can't replicate the issue.

@jepio If you think this is the right thing to do, would you be available to open the systemd ticket yourself? I'm not quite sure I'm the right person to do so.

whites11 commented 2 years ago

I might have something, finally. It seems the problem goes away if we disable out custom auditd default rule that replaces flatcar's default. This is the rule in place:

$ cat /etc/audit/rules.d/99-default.rules 
# Overridden by Giant Swarm.
-a exit,always -F arch=b64 -S execve -k auditing
-a exit,always -F arch=b32 -S execve -k auditing

QuentinBisson commented 2 years ago

Was there a regression of systemd in the latest flatcar releases?

There was an issue between auditd and systemd in the past so I'm trying to see if it can be related

flatcar / Flatcar