bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.79k stars 519 forks source link

CPU Soft Lockup when doing heavy IO via Kernel NFS server on local ephemeral storage #4307

Open snowzach opened 3 hours ago

snowzach commented 3 hours ago

Platform I'm building on:

Running a very simple NFS server container on bottlerocket-aws-k8s-1.25-x86_64-v1.26.1-943d9a41

Dockerfile:

# syntax=docker/dockerfile:1.9
FROM public.ecr.aws/debian/debian:bookworm

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
    nfs-common nfs-kernel-server curl ca-certificates zip unzip make time jq yq netbase iproute2 net-tools bind9-dnsutils procps xz-utils nano && rm -rf /var/lib/apt/lists/*

# Copy the entrypoint script
COPY --chmod=755 ./tools/docker/cloud-nfs/kernel/cloud-nfs-entrypoint.sh /entrypoint.sh
RUN mkdir /exports

# Install the AWS CLI
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" \
    && unzip awscliv2.zip \
    && ./aws/install \
    && rm -rf \
    awscliv2.zip

# Expose ports for NFSD/StatD/MountD/QuotaD
EXPOSE 2049 2050 2051 2052
VOLUME /exports

# Entrypoint
ENTRYPOINT ["/entrypoint.sh"]
CMD [ "/exports" ]

Entrypoint script:

#!/bin/bash
set -e

NFS_THREADS=${NFS_THREADS:-64}

function start() {

    # prepare /etc/exports
    fsid=0
    for i in "$@"; do
        echo "$i *(rw,fsid=$fsid,no_subtree_check,no_root_squash)" >> /etc/exports
        if [ -v gid ] ; then
            chmod 070 $i
            chgrp $gid $i
        fi
        echo "Serving $i"
        fsid=$((fsid + 1))
    done

    # start rpcbind if it is not started yet
    set +e
    /usr/sbin/rpcinfo 127.0.0.1 > /dev/null; s=$?
    set -e
    if [ $s -ne 0 ]; then
       echo "Starting rpcbind"
       /sbin/rpcbind -w
    fi

    mount -t nfsd nfds /proc/fs/nfsd

    # -V 3: enable NFSv3
    /usr/sbin/rpc.mountd -p 2050

    /usr/sbin/exportfs -r
    # -G 10 to reduce grace time to 10 seconds (the lowest allowed)
    # -V 3: enable NFSv3
    /usr/sbin/rpc.nfsd -G 10 -p 2049 $NFS_THREADS
    /sbin/rpc.statd --no-notify -p 2051 -o 2052 -T 2053
    echo "NFS started with $NFS_THREADS threads"
}

function stop()
{
    echo "Stopping NFS"

    /usr/sbin/rpc.nfsd 0
    /usr/sbin/exportfs -au
    /usr/sbin/exportfs -f

    kill $( pidof rpc.mountd )
    umount /proc/fs/nfsd
    echo > /etc/exports
    exit 0
}

trap stop TERM

start "$@"

# Keep the container running
sleep infinity

Essentially I run this on an AWS i3en with local flash provisioned as ephemeral storage shared via this NFS server. It's a high performance cache drive. Testing with i3en.2xlarge

What I expected to happen:

It would be a super fast NFS server sharing this ephemeral storage.

What actually happened:

I can mount this storage from another i3en.2xlarge instance and mostly it works unless we really push it. If I run the disk testing tool bonnie++ -d /the/nfs/share -u nobody and wait, within a minute or two the machine will start displaying errors in the logs about watchdog: BUG: soft lockup - CPU#7 stuck for 22s! as well as ena 0000:00:06.0 eth2: TX hasn't completed, qid 5, index 801. 26560 msecs since last interrupt, 41910 msecs since last napi execution, napi scheduled: 1

How to reproduce the problem:

Run the container, run bonnie++ on the NFS share.

It's very reliably reproduced.

Attached is a log: bottlerocket-log.txt

snowzach commented 2 hours ago

No idea if this is related but googling for CPU soft lockup and a few other keywords led me to this issue: https://bbs.archlinux.org/viewtopic.php?id=264127&p=3 that appears to be affected up until Kernel 5.15 (which is what we are running it appears)