amanusk / s-tui

Terminal-based CPU stress and monitoring utility
https://amanusk.github.io/s-tui/
GNU General Public License v2.0
4.01k stars 139 forks source link

s-tui crashes if some cores are offline #201

Open rkraneis opened 1 year ago

rkraneis commented 1 year ago

Step 1: Describe your environment

Step 2: Describe the problem:

Observed Results:

When taking some of the cores offline, s-tui just crashes:

s-tui -d
Traceback (most recent call last):
  File "/usr/bin/s-tui", line 33, in <module>
    sys.exit(load_entry_point('s-tui==1.1.4', 'console_scripts', 's-tui')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/s_tui/s_tui.py", line 912, in main
    graph_controller.main()
  File "/usr/lib/python3.11/site-packages/s_tui/s_tui.py", line 757, in main
    loop.run()
  File "/usr/lib64/python3.11/site-packages/urwid/main_loop.py", line 286, in run
    self._run()
  File "/usr/lib64/python3.11/site-packages/urwid/main_loop.py", line 384, in _run
    self.event_loop.run()
  File "/usr/lib64/python3.11/site-packages/urwid/main_loop.py", line 789, in run
    self._loop()
  File "/usr/lib64/python3.11/site-packages/urwid/main_loop.py", line 822, in _loop
    alarm_callback()
  File "/usr/lib64/python3.11/site-packages/urwid/main_loop.py", line 172, in cb
    callback(self, user_data)
  File "/usr/lib/python3.11/site-packages/s_tui/s_tui.py", line 853, in animate_graph
    self.view.update_displayed_information()
  File "/usr/lib/python3.11/site-packages/s_tui/s_tui.py", line 249, in update_displayed_information
    source.update()
  File "/usr/lib/python3.11/site-packages/s_tui/sources/freq_source.py", line 62, in update
    self.last_measurement = [psutil.cpu_freq(False).current]
                             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/psutil/__init__.py", line 1864, in cpu_freq
    ret = _psplatform.cpu_freq()
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/psutil/_pslinux.py", line 742, in cpu_freq
    raise NotImplementedError(

It also doesn't start until all cores are online again:

s-tui 
Traceback (most recent call last):
  File "/usr/bin/s-tui", line 33, in <module>
    sys.exit(load_entry_point('s-tui==1.1.4', 'console_scripts', 's-tui')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/s_tui/s_tui.py", line 911, in main
    graph_controller = GraphController(args)
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/s_tui/s_tui.py", line 715, in __init__
    possible_sources = self._load_config(args.t_thresh)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/s_tui/s_tui.py", line 648, in _load_config
    FreqSource(),
    ^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/s_tui/sources/freq_source.py", line 45, in __init__
    self.last_measurement = [0] * len(psutil.cpu_freq(True))
                                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/psutil/__init__.py", line 1864, in cpu_freq
    ret = _psplatform.cpu_freq()
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/psutil/_pslinux.py", line 742, in cpu_freq
    raise NotImplementedError(

Debug Results, output of s-tui -d created in a file _s-tui.log:

cat _s-tui.log 
2022-12-05 08:54:18,740 [_load_config()] [DEBUG]  User refresh rate: 2.0
2022-12-05 08:54:18,740 [_load_config()] [DEBUG]  No user config for temp threshold
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Acpitz,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Acpitz,1
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name PackageId0,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core26,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core27,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core28,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core29,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core30,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core31,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core0,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core4,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core8,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core12,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core16,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core20,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core24,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Core25,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Composite,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Sensor1,0
2022-12-05 08:54:18,784 [__init__()] [DEBUG]  Temp sensor name Sensor2,0
2022-12-05 08:54:18,790 [read_power()] [WARNI]  ignoring (PermissionError(13, 'Permission denied'), '/sys/class/powercap/intel-rapl:0/') for file <class 'RuntimeWarning'>
2022-12-05 08:54:18,790 [read_power()] [WARNI]  ignoring (PermissionError(13, 'Permission denied'), '/sys/class/powercap/intel-rapl:0:0/') for file <class 'RuntimeWarning'>
2022-12-05 08:54:18,790 [read_power()] [WARNI]  ignoring (PermissionError(13, 'Permission denied'), '/sys/class/powercap/intel-rapl:0:1/') for file <class 'RuntimeWarning'>
2022-12-05 08:54:18,791 [read_power()] [WARNI]  ignoring (PermissionError(13, 'Permission denied'), '/sys/class/powercap/intel-rapl:1/') for file <class 'RuntimeWarning'>
2022-12-05 08:54:18,793 [__init__()] [INFO ]  num cpus 20
2022-12-05 08:54:18,802 [on_unicode_checkbox()] [DEBUG]  unicode State is True
2022-12-05 08:54:18,806 [main_window()] [DEBUG]  Pile index: 18
2022-12-05 08:54:18,827 [eval_hooks()] [DEBUG]  Evaluating hooks
2022-12-05 08:54:18,830 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 23.1
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 35.7
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 0.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 0.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 0.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 30.8
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:18,831 [update()] [INFO ]  Utilization recorded [75.2, 100.0, 23.1, 100.0, 35.7, 100.0, 0.0, 100.0, 0.0, 100.0, 0.0, 100.0, 30.8, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2022-12-05 08:54:18,831 [update()] [INFO ]  Reading [79.0, 79.0, 78.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 74.0, 78.0, 75.0, 78.0, 75.0, 79.0, 78.0, 78.0, 30.85, 30.85, 36.85]
2022-12-05 08:54:18,854 [update()] [INFO ]  Reading [2968.3685500000006, 3200.032, 3199.957, 3200.0, 3151.461, 3200.0, 3153.366, 3200.0, 2700.0, 3199.967, 3162.505, 3200.064, 3200.032, 2700.017, 2700.045, 2699.99, 2699.966, 2700.004, 2700.009, 2699.971, 2700.047]
2022-12-05 08:54:18,854 [get_top()] [DEBUG]  Returning top 4160.0
2022-12-05 08:54:18,855 [update()] [INFO ]  Reading [75.2, 100.0, 23.1, 100.0, 35.7, 100.0, 0.0, 100.0, 0.0, 100.0, 0.0, 100.0, 30.8, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
...
2022-12-05 08:54:41,145 [update()] [INFO ]  Utilization recorded [73.7, 99.5, 18.4, 99.5, 4.9, 100.0, 14.8, 100.0, 6.4, 99.5, 13.8, 100.0, 14.9, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2022-12-05 08:54:41,145 [update()] [INFO ]  Reading [78.0, 78.0, 78.0, 78.0, 78.0, 77.0, 77.0, 77.0, 77.0, 73.0, 76.0, 74.0, 77.0, 73.0, 77.0, 78.0, 78.0, 30.85, 30.85, 35.85]
2022-12-05 08:54:41,145 [update()] [INFO ]  Reading [2990.3637499999995, 3200.0, 3200.0, 3200.0, 3107.788, 3200.0, 3200.032, 3200.032, 3099.757, 3200.032, 3200.098, 3200.032, 3200.0, 2699.994, 2699.916, 2699.966, 2700.012, 2700.094, 2700.008, 2700.026, 2699.959]
2022-12-05 08:54:41,146 [update()] [INFO ]  Reading [73.7, 99.5, 18.4, 99.5, 4.9, 100.0, 14.8, 100.0, 6.4, 99.5, 13.8, 100.0, 14.9, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2022-12-05 08:54:43,171 [eval_hooks()] [DEBUG]  Evaluating hooks
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 24.4
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 7.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 14.4
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 14.9
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 18.3
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 99.5
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 25.6
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,174 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,175 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,175 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,175 [update()] [INFO ]  Core id util 100.0
2022-12-05 08:54:43,175 [update()] [INFO ]  Utilization recorded [75.4, 100.0, 24.4, 100.0, 7.0, 100.0, 14.4, 100.0, 14.9, 100.0, 18.3, 99.5, 25.6, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2022-12-05 08:54:43,175 [update()] [INFO ]  Reading [78.0, 78.0, 78.0, 78.0, 78.0, 77.0, 77.0, 77.0, 77.0, 74.0, 75.0, 74.0, 77.0, 74.0, 77.0, 78.0, 78.0, 30.85, 30.85, 35.85]
2022-12-05 08:54:43,176 [update()] [INFO ]  Reading [2951.3931999999995, 3100.031, 3100.0, 3100.0, 3182.317, 3100.031, 3100.0, 3100.0, 3100.0, 3100.0, 3100.0, 3100.0, 3100.0, 2700.052, 2700.007, 2700.09, 2700.016, 2700.046, 2699.987, 2699.916, 2700.066]
2022-12-05 08:54:43,176 [update()] [INFO ]  Reading [75.4, 100.0, 24.4, 100.0, 7.0, 100.0, 14.4, 100.0, 14.9, 100.0, 18.3, 99.5, 25.6, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2022-12-05 08:54:45,199 [eval_hooks()] [DEBUG]  Evaluating hooks

Step 3: Reproduce the problem:

Steps to reproduce:

  1. Run s-tui
  2. disable some cores
  3. observe crash
echo 0 | sudo tee /sys/devices/system/cpu/cpu0/online
FallingSnow commented 1 year ago

I'm experiencing the same issue. Disable some cores then try to run s-tui.

archlinux% sudo s-tui
Traceback (most recent call last):
  File "/usr/bin/s-tui", line 33, in <module>
    sys.exit(load_entry_point('s-tui==1.1.4', 'console_scripts', 's-tui')())
  File "/usr/lib/python3.10/site-packages/s_tui/s_tui.py", line 912, in main
    graph_controller = GraphController(args)
  File "/usr/lib/python3.10/site-packages/s_tui/s_tui.py", line 716, in __init__
    possible_sources = self._load_config(args.t_thresh)
  File "/usr/lib/python3.10/site-packages/s_tui/s_tui.py", line 649, in _load_config
    FreqSource(),
  File "/usr/lib/python3.10/site-packages/s_tui/sources/freq_source.py", line 46, in __init__
    self.last_measurement = [0] * len(psutil.cpu_freq(True))
  File "/usr/lib/python3.10/site-packages/psutil/__init__.py", line 1864, in cpu_freq
    ret = _psplatform.cpu_freq()
  File "/usr/lib/python3.10/site-packages/psutil/_pslinux.py", line 742, in cpu_freq
    raise NotImplementedError(
NotImplementedError: can't find current frequency file
ohsix commented 1 year ago

same, don't know if this can be fixed in s-tui though, probably needs to be fixed in psutil

https://github.com/giampaolo/psutil/blob/7eadee31db2f038763a3a6f978db1ea76bbc4674/psutil/_pslinux.py#L749 needs an extra check to curr to skip cpus without frequency files

Jamesits commented 5 months ago

Link the relevant issue here: https://github.com/giampaolo/psutil/issues/2254