ValveSoftware / Proton

Compatibility tool for Steam Play based on Wine and additional components
Other
24.39k stars 1.07k forks source link

Containerized Proton breaks on Latest Fedora 40 #7995

Open ckupe opened 2 months ago

ckupe commented 2 months ago

Host Distribution: Fedora 40 Workstation and Server fetch Linux: 6.10.4-200.fc40.x86_64 Podman: 5.2.0 podman-info.txt

Proton Version:

======================
Proton: 1723129720 experimental-9.0-20240808
SteamGameId: not-steam
Command: ['enshrouded_server.exe']
Options: {'forcelgadd'}
Kernel: Linux 6.10.4-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Aug 11 15:32:50 UTC 2024 x86_64
Language: LC_ALL None, LC_MESSAGES None, LC_CTYPE None
Effective WINEDEBUG: +timestamp,+pid,+tid,+seh,+unwind,+threadname,+debugstr,+loaddll,+mscoree

Game used: Enshrouded Dedicated Server (version fd563a0389c99a6ba9ec59b8a233fe9df17e892d (master))

Container Dockerfile: https://github.com/steamutils/runner

Environment Config: https://github.com/steamutils/apps/tree/main/enshrouded

Issue:

Running proton natively on host for this game dedicated server works fine, but in a container it completely hangs and uses 100% CPU, spawning tons of child processes. Troubleshooting has limited root cause to being specific on Fedora Workstation/Server 40 and isolated to Proton, not wine.

Fresh install of Fedora 40 from ISO (which ships with linux 6.8.5) container works perfectly fine. Certain patches to the distro and kernel break containerized proton functionality. same linux kernel versions on different distros do not exhibit this behavior.

How to recreate:

podman logs -f enshrouded will show steamcmd updating, downloading the dedicated server files, but when it comes to running the dedicated server through proton in a container specifically, the process hangs and tons of threads/child processes spawn. CPU utilization stands at 100%.

logs-side-by-side

This shows proton running natively on this host (left) as well as in a container (right) and compares PROTON LOGS of the same binaries side by side at the point where they break.

Troubleshooting attempted:

Notes: Fedora, being an Enterprise Linux upstream, has opinionated configurations and hardening that eventually impacts downstream derivatives such as CentOS Stream and Red Hat Enterprise Linux. I suspect there was a distro-specific hardening configuration or sub package that changed what capabilities are exposed to containers that somehow breaks what Proton is doing.

Attached is a list from sysctl -a > log to show all sysctl variables configured on this host system. sysctl.variables.txt

faandg commented 2 months ago

Gonna start by saying my issue is similar but maybe not the same, also with enshrouded but on an Ubuntu based container (host is manjaro which is arch based).

My setup was working fine a couple of days ago and now updating to the newest available version got me here . I'm using mornedhels/enshrouded-server(:stable-proton) and I pretty much spent the past 2 days debugging everything else (server config, firewall, etc). I use podlet to generate systemd services. I did switch to the new enshrouded role system but reverting it doesn't seem to fix the problem either.

podlet --description enshrouded --file enshrouded.container --install --wanted-by multi-user.target --wanted-by default.target podman run --name=enshrouded \
  --secret enshrouded-boot-sh,type=mount,target=/scripts/boot.sh,uid=1051,mode=0700 \
  --secret enshrouded-post-update-sh,type=mount,target=/scripts/post-update.sh,uid=1051,mode=0700 \
  --secret enshrouded-server-password,type=env,target=SERVER_PASSWORD \
  --secret enshrouded-server-role0-password,type=env,target=SERVER_ROLE_0_PASSWORD \
  --secret enshrouded-server-role1-password,type=env,target=SERVER_ROLE_1_PASSWORD \
  --secret enshrouded-server-role2-password,type=env,target=SERVER_ROLE_2_PASSWORD \
  -e PUID=1051 \
  -e PGID=65537 \
  -e SERVER_NAME="/redacted/" \
  -e SERVER_SLOT_COUNT=5 \
  -e SERVER_QUERYPORT=15637 \
  -e SERVER_IP="0.0.0.0" \
  -e UPDATE_CRON="31 */2 * * *" \
  -e UPDATE_CHECK_PLAYERS=true \
  -e BACKUP_CRON="0 */2 * * *"  \
  -e BACKUP_MAX_COUNT=24 \
  -e GAME_BRANCH="public" \
  -e STEAMCMD_ARGS="validate" \
  -e SERVER_SAVE_DIR="/workdir/savegame" \
  -e SERVER_LOG_DIR="/workdir/logs" \
  -e BACKUP_DIR="/workdir/backups" \
  -e BOOTSTRAP_HOOK=/scripts/boot.sh \
  -e UPDATE_POST_HOOK=/scripts/post-update.sh \
  -e SERVER_ROLE_0_NAME=Admin \
  -e SERVER_ROLE_0_CAN_KICK_BAN=true \
  -e SERVER_ROLE_0_CAN_ACCESS_INVENTORIES=true \
  -e SERVER_ROLE_0_CAN_EDIT_BASE=true \
  -e SERVER_ROLE_0_CAN_EXTEND_BASE=true \
  -e SERVER_ROLE_0_RESERVED_SLOTS=1 \
  -e SERVER_ROLE_1_NAME=Friend \
  -e SERVER_ROLE_1_CAN_KICK_BAN=false \
  -e SERVER_ROLE_1_CAN_ACCESS_INVENTORIES=true \
  -e SERVER_ROLE_1_CAN_EDIT_BASE=true \
  -e SERVER_ROLE_1_CAN_EXTEND_BASE=true \
  -e SERVER_ROLE_1_RESERVED_SLOTS=3 \
  -e SERVER_ROLE_2_NAME=Guest \
  -e SERVER_ROLE_2_CAN_KICK_BAN=false \
  -e SERVER_ROLE_2_CAN_ACCESS_INVENTORIES=false \
  -e SERVER_ROLE_2_CAN_EDIT_BASE=false \
  -e SERVER_ROLE_2_CAN_EXTEND_BASE=false \
  -e SERVER_ROLE_2_RESERVED_SLOTS=0 \
  -p 15637:15637/udp \
  -v /poddata/enshrouded/workdir:/workdir \
  -v /poddata/enshrouded/game:/opt/enshrouded \
  --label "io.containers.autoupdate=registry" \
  --restart=always \
  docker.io/mornedhels/enshrouded-server:stable-proton

Server says UP at some point but is unreachable and stuff just hangs. This is htop:

image AMD Ryzen 9 7950X and everything just flies to max.

ckupe commented 1 month ago

Server says UP at some point but is unreachable and stuff just hangs. This is htop:

image AMD Ryzen 9 7950X and everything just flies to max.

My htop looks identical. my CPUs are intel based (12th gen and 13th gen).

I did not test manjaro/arch based distros; I wonder what similarities between arch/manjaro and fedora exist that would cause this bug to occur.

The underlying distro should not be breaking or changing how the container functions.