Closed sang-shelton closed 7 months ago
I have the same issue, first appeared at the beginning of February. Cannot kill the process, restarting PC is the only solution. I don't even have any flatpak applications.
Seems that I have this application installed just for gdm and gnome-desktop that I don't use anyways.
Removing bubblewrap makes it clear:
Packages (17) epiphany-45.2-1 flatpak-1:1.15.6-2 flatpak-kcm-5.93.0-1 gdm-45.0.1-1 gnome-desktop-1:44.0-1
gnome-desktop-4-1:44.0-1 gnome-desktop-common-1:44.0-1 gnome-session-45.0-1
gnome-settings-daemon-45.1-1 gnome-shell-1:45.4-1 mutter-45.4-1 nautilus-45.2.1-1
remmina-git-1.4.34.r12.gb44b6d622-1 remmina-plugin-teamviewer-1.3.0.0-4.1
webkit2gtk-4.1-2.42.5-1 webkitgtk-6.0-2.42.5-1 bubblewrap-0.8.0-1
Time to switch from gdm to sddm...
Additionally, sync
command never completes, I need to hard-reset the computer
6.8.0-rc4-1-mainline
bwrap is a tool that is used by higher-level components (sandboxed apps in Flatpak, sandboxed web components in WebKit2GTK and sandboxed thumbnailers in libgnome-desktop, among others). If some component is leaking bwrap subprocesses, that's more likely to be a bug in the higher-level component than in bwrap itself.
Please look at the command-line arguments that were passed to each bwrap process, for example ps 29002
for the first one listed in the original report. That will probably provide clues about what its caller was.
Also please look at the output of systemd-cgls
, which will probably provide more clues.
The bwrap processes seems to have no parent
All processes except init (process 1) have a parent. Do you mean that bwrap's parent process has exited and it has been reparented to process 1?
If the bwrap process is no longer useful after its parent has exited, then this might indicate that whatever is running bwrap should be using bwrap --die-with-parent
to get it automatically terminated when its parent exits.
flatpak-1:1.15.6-2
This is a development version of Flatpak, not a stable release. If your distribution has chosen to package it anyway, please talk to your distribution's support channels.
I don't even have any flatpak applications
If that's the case, then the reason bwrap is running is probably not Flatpak.
Time to switch from gdm to sddm...
i think we need to do our best to find bugs and report them. Also fix the root cause that is making many problems by stop using buggy languages like C and C++.
All processes except init (process 1) have a parent. Do you mean that bwrap's parent process has exited and it has been reparented to process 1?
I think made a mistake saying bwrap
has no parent, btop was showing that systemd
is the parent.
Like in telegram for instance:
systemd
is shown to be the parent.
Please look at the command-line arguments that were passed to each bwrap process, for example ps 29002 for the first one listed in the original report. That will probably provide clues about what its caller was.
What shall i do with that number 29002
to know the caller ? Is looking at btop's output enough ? But it only shows the parent not the caller as far as i know.
Like I already said, you can find the command-line options that were passed to process 29002 by running ps 29002
.
Because you're using systemd, looking at systemd-cgls
to see whether it's in a cgroup would also give you useful clues. For instance, if there's a bwrap
process that is in app-flatpak-org.gnome.Weather-1054468.scope
, then it was run by Flatpak as part of launching org.gnome.Weather
.
Additionally,
sync
command never completes, I need to hard-reset the computer6.8.0-rc4-1-mainline
I could observe exactly the same issue with kernel 6.8.0-rc4. sync
just hung while bwrap
was shown using 100% of a single CPU core in the process list. It could not be killed and only a reset (via SysRq magic keys or hard-reset) seemed to work.
ps
just printed [bwrap]
for the offending PID, so there were no arguments. I can not recall the cgroup, but will provide additional information on reoccurrence.
Edit: The last time this happened the content of the corresponding /proc/pid/cgroup was cgroup0::/user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-com.usebottles.bottles-23061.scope
.
with kernel 6.8.0-rc4
At the risk of stating the obvious, this is a release candidate, not a production-ready kernel release. If the issue is specific to this kernel and does not appear with earlier kernels, please report it as a kernel regression.
ps just printed [bwrap] for the offending PID, so there were no arguments
This usually means the process has been swapped out or is otherwise inaccessible.
I'm seeing the same behavior with 6.8.0-11-generic. Which is not a release candidate.
The last time this happened the content of the corresponding /proc/pid/cgroup was
cgroup0::/user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-com.usebottles.bottles-23061.scope
.
In that case, please investigate whatever Bottles is doing with bwrap (what arguments it passes, etc.).
I'm seeing this also, on Fedora 40. Has happened to me twice now, like other reporters, only started happening in recent weeks. I use Element, Discord and Slack as flatpaks; Evolution also seems to use bwrap to isolate something. If I kill all the apps that use bwrap, though, the zombie 99% CPU process sticks around and cannot be killed by anything. systemd-cgls shows it in a group with xdg-desktop-portal.service and nothing else (but this is after I killed all the flatpak apps):
│ │ │ ├─xdg-desktop-portal.service
│ │ │ │ ├─ 3875 /usr/libexec/xdg-desktop-portal
│ │ │ │ └─558391 [bwrap]
it's definitely part of xdg-desktop-portal.service , because trying to stop that service fails. systemctl tries to kill bwrap with sigterm, then with sigabrt, then with sigkill. None of them work:
Feb 21 22:16:07 xps13a.happyassassin.net systemd[2785]: xdg-desktop-portal.service: State 'final-sigterm' timed out. Aborting.
Feb 21 22:16:07 xps13a.happyassassin.net systemd[2785]: xdg-desktop-portal.service: Killing process 558391 (bwrap) with signal SIGABRT.
Feb 21 22:16:52 xps13a.happyassassin.net systemd[2785]: xdg-desktop-portal.service: State 'final-watchdog' timed out. Killing.
Feb 21 22:16:52 xps13a.happyassassin.net systemd[2785]: xdg-desktop-portal.service: Killing process 558391 (bwrap) with signal SIGKILL.
Feb 21 22:17:38 xps13a.happyassassin.net systemd[2785]: xdg-desktop-portal.service: Processes still around after final SIGKILL. Entering failed mode.
Feb 21 22:17:38 xps13a.happyassassin.net systemd[2785]: xdg-desktop-portal.service: Failed with result 'timeout'.
Feb 21 22:17:38 xps13a.happyassassin.net systemd[2785]: xdg-desktop-portal.service: Unit process 558391 (bwrap) remains running after unit stopped.
Feb 21 22:17:38 xps13a.happyassassin.net systemd[2785]: Stopped xdg-desktop-portal.service - Portal service.
Feb 21 22:17:38 xps13a.happyassassin.net systemd[2785]: xdg-desktop-portal.service: Consumed 3h 18min 38.972s CPU time, 27.4M memory peak, 3.2M memory swap peak.
xdg-desktop-portal itself hasn't really changed significantly in Fedora lately; the mass rebuild a month ago would have caused it to be built with GCC 14 and newer glibc, though.
so, hmm. after a reboot, with my flatpak apps all behaving, there's no "odd" bwrap process listed as just [bwrap]
like the misbehaving one was, and systemd-cgls
shows xdg-desktop-portal all on its own:
│ │ │ ├─xdg-desktop-portal.service
│ │ │ │ └─4477 /usr/libexec/xdg-desktop-portal
│ │ │ ├─org.freedesktop.IBus.session.GNOME.service
so I got kinda suspicious that this really is just about xdg-desktop-portal using bwrap directly. and, lo and behold, it does do that, in this script that's meant to "validate" icons somehow. The relevant stuff is all the stuff inside the #ifdef HELPER
blockers, because HELPER
here is bwrap (per this bit of src/meson.build
).
This looks to be a feature xdg-desktop-portal calls "Sandboxed image validation" - see this blurb it prints if it can't find bwrap. I suspect that may be what's causing this problem. The most recent change to the icon validation script itself was this - anyone want to guess if that might be causing this issue?
tagging @GeorgesStavracas and @hfiguiere in case they have thoughts.
xdg-desktop-portal uses bubblewrap to decode and validate images (something known to have exploited vulnerabilities) in a secure environment. Either xdg-desktop-portal is not tearing down the bubblewrap subprocess properly, or the image validator is entering some sort of infloop.
Still, looking at the other reporters' cases, they don't all seem to be the same. @sang-shelton has multiple bwrap processes. I don't see xdg-desktop-portal in the list of packages @olekolek1000 posted, so maybe they don't have it installed? @zenofile 's stuck process looks like mine, but their cgroup does not. So, maybe this isn't as straightforward as "xdg-desktop-portal did it" after all?
@zenofile 's stuck process looks like mine, but their cgroup does not.
The last time this happened, the cgroup was a different one (0::/user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome\x2dsession\x2dmanager.slice/gnome-session-manager@gnome.service
) which matches my observation that this occurs rather randomly without a common cgroup/parent.
I managed to do a bpftrace -e 'profile:hz:99 { @[kstack] = count(); }'
for 30 seconds to actually see what the kernel is doing:
Here is a normal one without the stuck bwrap process:
There is a lot of cleanup_mnt
stuff going on. If this is a kernel bug (why wouldn't the process be able to be terminated if it was not?), I would like to come up with a way to reproduce this in order to bisect it more easily, but I did not manage to trigger the problem on demand.
There is a lot of
cleanup_mnt
stuff going on. If this is a kernel bug (why wouldn't the process be able to be terminated if it was not?),
bwrap certainly does set up mounts in a new mount namespace, which the kernel needs to clean up when bwrap exits. It seems plausible that the CPU time taken to do that cleanup would be accounted to the bwrap process, even though bwrap no longer really exists at that point, and has no further control over what the kernel is doing on its behalf.
If the bwrap process has exited, and what is actually running is kernel code cleaning up after it, then that would also explain why it appears as [bwrap]
in ps
: during that cleanup, the bwrap process doesn't really exist any more, so there is no longer a command-line or other information for ps
to report.
do_exit+851 do_group_exit+45 __x64_sys_exit_group+24 do_syscall_64+134
SYS_exit_group
is the implementation of _exit(2)
, so this looks very much like the result of the cleanup that is done in the kernel when bwrap exits, rather than bwrap code running in user-space.
If this is a kernel regression (taking a long time to clean up mount namespaces) then that would also explain why it happens for bwrap processes that are run for several different reasons (xdg-desktop-portal, Flatpak, Bottles, gnome-session-manager, others).
FWIW, thanks to keeping an eternal shell history, I can see that the other time I ran into this was 2024-02-11, and I was running kernel 6.8.0-0.rc3.20240209git1f719a2f3fa6.31.fc40.x86_64 that day. I've been running 6.8 snapshots since 2024-01-11 , 6.8.0-0.rc0.20240109git9f8413c4a66f.1.fc40.x86_64 - I ran that kernel till 2024-02-03, when I went to 6.8.0-0.rc2.20240201git6764c317b6bb.22.fc40.x86_64 . So I was on the 20240109 snapshot for three weeks and the 20240201 snapshot for eight days without running into this, then since I got the 20240209 snapshot I've run into it twice (the most recent time I was on kernel 6.8.0-0.rc4.20240212git716f4aaa7b48.35.fc40.x86_64 ).
In the Fedora kernel Matrix channel, someone suggested this may be the same as https://lore.kernel.org/all/6a150ddd-3267-4f89-81bd-6807700c57c1@redhat.com/ , for which a fix is pending - https://lore.kernel.org/all/f15ee051-2cfe-461f-991d-d09fd53bad4f@leemhuis.info/
In the Fedora kernel Matrix channel, someone suggested this may be the same as https://lore.kernel.org/all/6a150ddd-3267-4f89-81bd-6807700c57c1@redhat.com/ , for which a fix is pending - https://lore.kernel.org/all/f15ee051-2cfe-461f-991d-d09fd53bad4f@leemhuis.info/
That seems plausible. If this is a kernel bug, then probably bwrap has a higher than usual chance to trigger it because it makes non-trivial use of mount namespaces, and other container tools like podman can probably also trigger it - so repeatedly running and exiting podman containers might be another reproducer.
I'm seeing the same behavior with 6.8.0-11-generic. Which is not a release candidate.
Are you sure about that? I don't see any sign on https://kernel.org/ of 6.8.0 having been released.
If this is an Ubuntu kernel, the development branch of Ubuntu seems to have a naming scheme where a packaged kernel version based on an upstream release candidate can be labelled as 6.8.0 before 6.8.0 actually exists (which seems unwise to me, but I didn't design their workflow). If I'm reading their git repository correctly then it's really 6.8rc4.
I am now using the latest stable kernel 6.7.6
and everything seems to be working fine
I applied the regression fix/revert (thanks @AdamWill) from the kernel mailing list to the current Fedora rawhide kernel and so far did not see any stuck processes.
so repeatedly running and exiting podman containers might be another reproducer.
I actually did try something like that but did not have any luck. Also with mock and systemd-nspawn. It was always seemingly random — but I hope it's fixed now with that proposed patch.
Edit: Works fine, no issues in 24 hours.
In the Fedora kernel Matrix channel, someone suggested this may be the same as https://lore.kernel.org/all/6a150ddd-3267-4f89-81bd-6807700c57c1@redhat.com/ , for which a fix is pending - https://lore.kernel.org/all/f15ee051-2cfe-461f-991d-d09fd53bad4f@leemhuis.info/
If that's the case, then 6.8rc6 should fix this.
The issue can probably be closed now...our kernel maintainer says the same as smcv (fix was in 6.8rc6). Don't think bubblewrap can do anything about it.
Thanks for the confirmation, closing as "not our bug".
Nautilus starts multiple process named bwrap which continues to infinity and adds cpu wattages many tens of wats, usually over 60 Wats in my two-processor 88-thread pc.
So you must understand i so pist off for that this even rais up my electricity bill couse my computers is online 24/7.
The issue can probably be closed now...our kernel maintainer says the same as smcv (fix was in 6.8rc6). Don't think bubblewrap can do anything about it.
Linux xeon-e5-2696-v4 6.8.0-11-generic #11-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 14 00:29:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
and still problem !
@ame867, if you are experiencing a bug with similar symptoms, and you are confident that your kernel does not have the kernel bug that was seen in 6.8 release candidates, please open a separate issue with full details (instead of replying to an issue that was already closed).
Nautilus starts multiple process named bwrap which continues to infinity and adds cpu wattages many tens of wats, usually over 60 Wats in my two-processor 88-thread pc.
So you must understand i so pist off for that this even rais up my electricity bill couse my computers is online 24/7.
I am now using kernel 6.9.12 with Manjaro KDE and this problem is not happening.
System info (inxi -Faz)
``` System: Kernel: 6.8.0-060800rc4-generic arch: x86_64 bits: 64 compiler: N/A clocksource: tsc available: acpi_pm parameters: BOOT_IMAGE=/vmlinuz-6.8.0-060800rc4-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash vt.handoff=7 Desktop: GNOME v: 45.2 tk: GTK v: 3.24.38 wm: gnome-shell dm: GDM3 v: 45.beta Distro: Ubuntu 23.10 (Mantic Minotaur) Machine: Type: Laptop System: ASUSTeK product: ROG Strix G614JU_G614JU v: 1.0 serial:
Mobo: ASUSTeK model: G614JU v: 1.0 serial:
UEFI: American Megatrends LLC. v: G614JU.321 date: 10/24/2023
Battery:
ID-1: BAT0 charge: 84.5 Wh (100.0%) condition: 84.5/90.0 Wh (93.9%)
volts: 17.3 min: 16.0 model: AS3GYFG3KC R220358 type: Unknown
serial: status: full
CPU:
Info: model: 13th Gen Intel Core i9-13980HX bits: 64 type: MST AMCP
arch: Raptor Lake gen: core 13 level: v3 note: check built: 2022+
process: Intel 7 (10nm) family: 6 model-id: 0xB7 (183) stepping: 1
microcode: 0x11D
Topology: cpus: 1x cores: 24 mt: 8 tpc: 2 st: 16 threads: 32 smt: enabled
cache: L1: 2.1 MiB desc: d-16x32 KiB, 8x48 KiB; i-8x32 KiB, 16x64 KiB
L2: 32 MiB desc: 8x2 MiB, 4x4 MiB L3: 36 MiB desc: 1x36 MiB
Speed (MHz): avg: 1127 high: 4852 min/max: 800/5400:5600:4000 scaling:
driver: intel_pstate governor: powersave cores: 1: 1089 2: 1063 3: 796
4: 800 5: 1104 6: 802 7: 819 8: 1240 9: 818 10: 4852 11: 800 12: 800
13: 1777 14: 800 15: 800 16: 800 17: 800 18: 800 19: 801 20: 800 21: 800
22: 800 23: 800 24: 800 25: 800 26: 1322 27: 800 28: 800 29: 3139 30: 871
31: 800 32: 1990 bogomips: 154828
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Vulnerabilities:
Type: gather_data_sampling status: Not affected
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed status: Not affected
Type: spec_rstack_overflow status: Not affected
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Enhanced / Automatic IBRS, IBPB:
conditional, RSB filling, PBRSB-eIBRS: SW sequence
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: Intel Raptor Lake-S UHD Graphics vendor: ASUSTeK driver: i915
v: kernel alternate: xe arch: Gen-13 process: Intel 7 (10nm) built: 2022+
ports: active: eDP-1 empty: DP-1, DP-2, HDMI-A-1, HDMI-A-2
bus-ID: 0000:00:02.0 chip-ID: 8086:a788 class-ID: 0300
Device-2: NVIDIA AD107M [GeForce RTX 4050 Max-Q / Mobile]
vendor: ASUSTeK GN21-X2 driver: nvidia v: 535.154.05
alternate: nvidiafb,nouveau,nvidia_drm non-free: 535.xx+
status: current (as of 2023-08) arch: Lovelace code: AD1xx
process: TSMC n4 (5nm) built: 2022-23+ ports: active: none
empty: DP-3,HDMI-A-3,eDP-2 bus-ID: 0000:01:00.0 chip-ID: 10de:28e1
class-ID: 0300
Device-3: Sonix USB2.0 HD UVC WebCam driver: uvcvideo type: USB rev: 2.0
speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-8:3 chip-ID: 322e:2122
class-ID: 0e02
Display: wayland server: X.org v: 1.21.1.7 with: Xwayland v: 23.2.0
compositor: gnome-shell driver: X: loaded: modesetting,nvidia
unloaded: fbdev,nouveau,vesa dri: iris gpu: i915 display-ID: 0
Monitor-1: eDP-1 model: TL160ADMP03-0 built: 2022 res: 2560x1600 dpi: 188
gamma: 1.2 size: 345x215mm (13.58x8.46") diag: 407mm (16") ratio: 16:10
modes: 2560x1600
API: OpenGL v: 4.6 Mesa 23.2.1-1ubuntu3.1 renderer: Mesa Intel Graphics
(RPL-S) direct-render: Yes
Audio:
Device-1: Intel vendor: ASUSTeK driver: snd_hda_intel v: kernel
alternate: snd_sof_pci_intel_tgl bus-ID: 0000:00:1f.3 chip-ID: 8086:7a50
class-ID: 0403
Device-2: NVIDIA vendor: ASUSTeK driver: snd_hda_intel v: kernel
bus-ID: 0000:01:00.1 chip-ID: 10de:22be class-ID: 0403
API: ALSA v: k6.8.0-060800rc4-generic status: kernel-api
tools: alsactl,alsamixer,amixer
Server-1: PipeWire v: 0.3.79 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
tools: pw-cat,pw-cli,wpctl
Network:
Device-1: Intel driver: iwlwifi v: kernel port: N/A bus-ID: 0000:00:14.3
chip-ID: 8086:7a70 class-ID: 0280
IF: wlo1 state: up mac:
Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
vendor: ASUSTeK driver: r8169 v: kernel port: 4000 bus-ID: 0000:6c:00.0
chip-ID: 10ec:8168 class-ID: 0200
IF: enp108s0 state: down mac:
Bluetooth:
Device-1: Intel driver: btusb v: 0.8 type: USB rev: 2.0 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 1-14:4 chip-ID: 8087:0033 class-ID: e001
Report: hciconfig ID: hci0 rfk-id: 0 state: down
bt-service: enabled,running rfk-block: hardware: no software: yes
address:
Info: acl-mtu: 1021:4 sco-mtu: 96:6 link-policy: rswitch sniff
link-mode: peripheral accept
RAID:
Hardware-1: Intel Volume Management Device NVMe RAID Controller Intel
driver: vmd v: 0.6 port: N/A bus-ID: 0000:00:0e.0 chip-ID: 8086:a77f rev:
class-ID: 0104
Drives:
Local Storage: total: 953.87 GiB used: 79.48 GiB (8.3%)
SMART Message: Required tool smartctl not installed. Check --recommends
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Micron model: 2400 MTFDKBA1T0QFM
size: 953.87 GiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
lanes: 4 tech: SSD serial: fw-rev: V3MA003 temp: 46.9 C
scheme: GPT
Partition:
ID-1: / raw-size: 950.8 GiB size: 934.8 GiB (98.32%) used: 79.19 GiB (8.5%)
fs: ext4 dev: /dev/dm-1 maj-min: 252:1 mapped: ubuntu--vg-ubuntu--lv
ID-2: /boot raw-size: 2 GiB size: 1.9 GiB (95.01%) used: 284.5 MiB (14.6%)
fs: ext4 dev: /dev/nvme0n1p2 maj-min: 259:2
ID-3: /boot/efi raw-size: 1.05 GiB size: 1.05 GiB (99.80%)
used: 6.1 MiB (0.6%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
Swap:
Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: file size: 8 GiB used: 0 KiB (0.0%) priority: -2
file: /swap.img
Sensors:
System Temperatures: cpu: 50.0 C mobo: N/A
Fan Speeds (rpm): N/A
Info:
Processes: 581 Uptime: 4m wakeups: 2 Memory: total: 32 GiB note: est.
available: 30.97 GiB used: 3.46 GiB (11.2%) Init: systemd v: 253
target: graphical (5) default: graphical tool: systemctl Compilers:
gcc: 13.2.0 alt: 12/13 Packages: 1741 pm: dpkg pkgs: 1732 libs: 914
tools: apt,apt-get pm: snap pkgs: 9 Shell: Bash v: 5.2.15
running-in: gnome-terminal inxi: 3.3.29
```
bwrap processes keeps getting created, using too much CPU and i have to reboot my device. Also i am unable to kill the bwrap processes. The bwrap processes seems to have no parent.