bit-team / backintime

Back In Time - An easy-to-use backup tool for GNU Linux using rsync in the back
https://backintime.readthedocs.io
GNU General Public License v2.0
2.01k stars 197 forks source link

`qt5_probing.py` makes `xorg.bin` run with high CPU usage, eating RAM #1592

Open noyannus opened 8 months ago

noyannus commented 8 months ago

When backintime runs its qt5_probing.py, xorg.bin consumes a full CPU (~97..~102%). This happens only with the /usr/bin/python3 /usr/share/backintime/common/qt5_probing.py processes. Quickly after they are killed, xorg.bin is back to normal. Also RAM and swap fill up. If I don't kill the qt5_probing.py in time, the machine becomes unresponsive (and hot, duh). Maybe related: their CPU loads are high themselves.

Before killing:

$ top -c
...
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2286 root      20   0 1569896 129604  88988 R 100,0 0,397  16:37.40 /usr/bin/Xorg.bin -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_AwPWFy -noreset -displayfd 16
4336 root      39  19  190912  45644  30924 S 13,00 0,140   3:43.98 /usr/bin/python3 /usr/share/backintime/common/qt5_probing.py
4337 root      39  19  190912  45136  30672 S 13,00 0,138   3:44.16 /usr/bin/python3 /usr/share/backintime/common/qt5_probing.py
5468 root      39  19  190912  45260  30668 S 12,67 0,139   0:12.71 /usr/bin/python3 /usr/share/backintime/common/qt5_probing.py
5467 root      39  19  190912  45564  30844 S 12,00 0,140   0:12.43 /usr/bin/python3 /usr/share/backintime/common/qt5_probing.py
2857 me        20   0 4057896 477992 202676 S 6,667 1,464   1:20.04 /usr/bin/plasmashell --no-respawn
4334 root      39  19 1866060 1,764g   4096 S 5,333 5,667   1:20.48 python3 -Es /usr/share/backintime/common/backintime.py backup-job
4335 root      39  19 1866092 1,764g   4096 S 5,333 5,664   1:21.01 python3 -Es /usr/share/backintime/common/backintime.py --profile-id 2 backup-job
5464 root      39  19  129716 114832   4096 S 5,000 0,352   0:05.32 python3 -Es /usr/share/backintime/common/backintime.py backup-job
5466 root      39  19  129748 114576   4096 S 5,000 0,351   0:05.26 python3 -Es /usr/share/backintime/common/backintime.py --profile-id 2 backup-job
3095 me        20   0 1464276 126100  97488 S 0,667 0,386   0:03.97 /usr/bin/easyeffects --gapplication-service
  36 root      20   0       0      0      0 S 0,333 0,000   0:01.30 [ksoftirqd/3]
  48 root      20   0       0      0      0 S 0,333 0,000   0:01.15 [ksoftirqd/5]

The processes were killed with:

for pid in $(ps -ef | awk '/\/backintime\/common\/qt5_probing\.py/ {print $2}'); do kill -9 $pid; done

About a minute after after killing xorg.bin is at <1% CPU load.

$ top -c
...
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
5759 root      39  19  158868 147672   4480 D 47,00 0,452   0:26.96 rsync
 2857 me        20   0 4066220 477992 202676 S 6,333 1,464   1:31.78 plasmashell
5813 root      20   0       0      0      0 I 3,667 0,000   0:00.74 kworker/u16:28-btrfs-endio-meta
5801 root      20   0       0      0      0 I 2,333 0,000   0:00.30 kworker/u16:15-btrfs-endio-meta
5835 root      20   0       0      0      0 I 2,000 0,000   0:00.42 kworker/u16:50-btrfs-endio-meta
5863 root      20   0       0      0      0 I 2,000 0,000   0:00.77 kworker/u16:78-btrfs-endio-meta
5822 root      20   0       0      0      0 I 1,333 0,000   0:00.42 kworker/u16:37-btrfs-endio-meta
2286 root      20   0 1572172 129732  88988 S 0,667 0,397  17:37.14 Xorg.bin
3064 me         9 -11  121496  21132   8832 S 0,667 0,065   0:01.50 pipewire
3095 me        20   0 1464276 129300  97488 S 0,667 0,396   0:05.39 easyeffects
  89 root       0 -20       0      0      0 I 0,333 0,000   0:00.11 kworker/0:1H-kblockd
 200 root       0 -20       0      0      0 I 0,333 0,000   0:00.15 kworker/1:1H-kblockd
 201 root       0 -20       0      0      0 I 0,333 0,000   0:00.07 kworker/4:1H-kblockd
 202 root       0 -20       0      0      0 I 0,333 0,000   0:00.18 kworker/7:1H-kblockd

The backup jobs have finished (one rsync is still active), but earlier tests have shown that they not the culprits.

This happened with BiT version 1.4.1 both from YaST (SUSE packet manager), and directly from GitHub. Python version is 3.11.6. Operating System: openSUSE Tumbleweed 20231215 KDE Plasma Version: 5.27.10 KDE Frameworks Version: 5.112.0 Qt Version: 5.15.11 Kernel Version: 6.6.3-1-default (64-bit) Graphics Platform: X11 Processors: 8 × 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz

To help us diagnose the problem quickly, please provide the output of the console command backintime --diagnostics.

Wellllll..... Could that have a common cause?

$ backintime --diagnostics
Traceback (most recent call last):
File "/usr/share/backintime/common/backintime.py", line 1190, in <module>
startApp()
File "/usr/share/backintime/common/backintime.py", line 507, in startApp
args = argParse(None)
^^^^^^^^^^^^^^
File "/usr/share/backintime/common/backintime.py", line 568, in argParse
args, unknownArgs = mainParser.parse_known_args(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/argparse.py", line 1902, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/argparse.py", line 2114, in _parse_known_args
start_index = consume_optional(start_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/argparse.py", line 2054, in consume_optional
take_action(action, args, option_string)
File "/usr/lib64/python3.11/argparse.py", line 1978, in take_action
action(self, namespace, argument_values, option_string)
File "/usr/share/backintime/common/backintime.py", line 742, in __call__
diagnostics = collect_diagnostics()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/share/backintime/common/diagnostics.py", line 74, in collect_diagnostics
'OS': _get_os_release()
^^^^^^^^^^^^^^^^^
File "/usr/share/backintime/common/diagnostics.py", line 398, in _get_os_release
return osrelease['os-release']
~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'os-release'
ptilopteri commented 7 months ago

backintime 1.4.4-dev still hanges on usr/bin/python3 /usr/share/backintime/common/qt5_probing.py

unless a root instance of backintime-qt is open.

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

ptilopteri commented 7 months ago

and dbus_launch.x11 is hanging again

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

aryoda commented 7 months ago

@ptilopteri THX for checking the new version! Yes, I could not fix the problem for this release since I did not find a solution and since you are the only user so far reporting this after including a 30-secs timeout which for some reasons does not work in your case

The current workaround is described in the known issues (which totally disables the systray icon).

I am considering two options:

  1. Never start the systray icon as root (except in the GUI - but neither from CLI nor cron job)
  2. Add an option to suppress the systray icon and add this option to root cron jobs (feels like too much work though)

I would like to wait for more (other) user issues on this topic perhaps I can collect evidence then to find the root cause and a solution.

ptilopteri commented 7 months ago

Message ID: @.***>

well, doesn't quite work the way you describe, at least for me.

stopped current root instance of backintime-qt removed /usr/share/backintime/common/qt5_probing.py completely and started a root cron job systray icon appeared and cron job completed. systray icon disappeared

repeated with same result and /usr/bin/dbus-launch.x11 appears but closes several seconds after rsync finishes.

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

ptilopteri commented 7 months ago

rebooted and performed same sequence with same results

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

aryoda commented 7 months ago

@ptilopteri My bad, I forgot to remove other code changes before testing the proposed workaround. Thanks a lot for reporting it here.

I will prepare a patch file to disable the qt5 probing as root (even though the GUI will also be affected) but for now this will be the easiest workaround for you until I find a better solution.

aryoda commented 7 months ago

@ptilopteri I have prepared a workaround as patch for the hanging qt5_probing.py which works in my VM:

1592_workaround.txt

To apply it use

sudo patch -p1 /usr/share/backintime/common/qt5_probing.py < 1592_workaround.txt

with the correct installation path of backintime (which backintime).

This effectively disables qt5_probing completely if run as root so you should never see a hanging qt5_probing.py (can be checked with ps which shows the path).

ptilopteri commented 7 months ago

that works for me but now there is no systemtrayicon. before applying patch and with removing /usr/share/backintime/plugins/qt4plugin.py, was working and systemtrayicon was functional. ??

but, there was a haning dbus-launch.x11 which caused X11 to consume CPU cycles.

tks -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

aryoda commented 7 months ago

that works for me but now there is no systemtrayicon.

THX a lot for testing and reporting the test results here!

That is intentional since it is a workaround (better no systray icon instead of a hanging system) until I find a better solution.

and with removing /usr/share/backintime/plugins/qt4plugin.py

This file should not be installed anymore since BiT v1.4.0 If it still existed it would explain why the system did hang no matter what changes were made to qt5_probing.py.

Since BiT v1.4.3 (release yesterday) make install does remove qt4plugin.py to avoid such left-over files.

ptilopteri commented 7 months ago

latest BiT build, c65fc91, is again hanging on /usr/share/backintime/common/qt5_probing.py and will not progress until /killing that process. root cron job.

I have added: sleep 45 ; kill -9 $(pidof /usr/bin/python3 /usr/share/backintime/common/qt5_probing.py) $(pidof dbus-launch.x11)

in order to successfully utilize BiT for root scheduled jobs.

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

aryoda commented 7 months ago

latest BiT build, c65fc91, is again hanging on /usr/share/backintime/common/qt5_probing.py and will not progress until /killing that process. root cron job.

Did you apply my patch after installing this build?

Without that patch your system will hang for whatever reasons.

As I mentioned here I am trying to find a solution that will show at least the systray icon for BiT (root) backups started via the GUI (but not by cron as root - this is simply not reliable anymore).

Please give me some time for this...

ptilopteri commented 7 months ago

ok, applied yr patch to build c65fc91

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

ptilopteri commented 6 months ago

rebuild bit from git last night to 1.4.4-dev.6eb26beb and /usr/share/backintime/common/qt5_probing.py is stil present. Isn't that supposed to be removed?

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

aryoda commented 6 months ago

/usr/share/backintime/common/qt5_probing.py is stil present Isn't that supposed to be removed?

No, it requires "just" a decent fix (subject to be found). All I can do now is ask for patience...

ptilopteri commented 6 months ago

built 1.4.4-dev.5cbffdf5 today and your patch no longer takes. Is it not anymore required?

tks -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

aryoda commented 6 months ago

built 1.4.4-dev.5cbffdf5 today and your patch no longer takes. Is it not anymore required?

What do you mean with "no longer takes"?

Does patching fail or is the patch no longer required to avoid the high CPU usage?

Some background: A few commits ago we migrated the dev version of BiT from Qt5 to Qt6 (using the Python package PyQt6) which also required renaming the qt5_probing.py to qt_probing.py (so my above patch may not find the file anymore) but Qt6 may possibly also contain a fix for the problem.

ptilopteri commented 6 months ago

built 1.4.4-dev.5cbffdf5 and tried to apply the patch but the patch said it was already applied ??? It was ?? applied during the build automagically or ?? as I was unable to apply it: sudo patch -p1 /usr/share/backintime/common/qt5_probing.py < ./1592_workaround.txt patching file /usr/share/backintime/common/qt5_probing.py Reversed (or previously applied) patch detected! Assume -R? [n]

so the patch has been committeed ?? or ???? and is no longer necessary?

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

ptilopteri commented 6 months ago

fwiw: I edited your provided patch, s/qt5/q5/ and it applied correctly and am testing the build. And it completed successfully.

note: instructions to "rm /usr/share/backintime/common/qt5_probing.py" on github should be revised.

tks

-- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

aryoda commented 4 months ago

TODO: Check if this kernel change may be reason the for the blocking xorg.bin:

Disallows open of FIFOs or regular files not owned by the user in world writable sticky directories, unless the owner is the same as that of the directory or the file is opened without the O_CREAT flag.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=30aba6656f61ed44cba445a3c0d38b296fa9e8f5

We are "hijacking" the X11 files of the user 1000 when running as root "just" to show the systray icon:

echo $XDG_RUNTIME_DIR /run/user/1000 ~ > echo $XAUTHORITY /run/user/1000/gdm/Xauthority

It looks like the kernel patch may strike here...

emtiu commented 1 month ago

We are "hijacking" the X11 files of the user 1000 when running as root "just" to show the systray icon:

echo $XDG_RUNTIME_DIR /run/user/1000 ~ > echo $XAUTHORITY /run/user/1000/gdm/Xauthority

It looks like the kernel patch may strike here...

It looks like the same problem is at work in #1716.