amanusk / s-tui

Terminal-based CPU stress and monitoring utility
https://amanusk.github.io/s-tui/
GNU General Public License v2.0
4.05k stars 139 forks source link

IndexError: list index out of range #144

Closed XeonBloomfield closed 4 years ago

XeonBloomfield commented 4 years ago

Step 1: Describe your environment

Step 2: Describe the problem:

Observed Results:

Problem occurs on ThinkPad with Intel and Radeon GPU. When you start s-tui before running any application on dedicated GPU (when it's temperature is not being reported - sensors output attached in problem reproduction part) and then start using that card s-tui crashes. Same problem occurs vice versa - if I start s-tui with Radeon GPU active, I can see it's temperature graph and crashes after closing application using dedicated GPU.

/home/xeonbloomfield/.local/lib/python3.6/site-packages/psutil/_pslinux.py:1227: RuntimeWarning: ignoring OSError(22, 'Invalid argument') for file '/sys/class/hwmon/hwmon1/temp1_input'
  RuntimeWarning)
Traceback (most recent call last):
  File "/home/xeonbloomfield/.local/bin/s-tui", line 11, in <module>
    sys.exit(main())
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/s_tui/s_tui.py", line 882, in main
    graph_controller.main()
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/s_tui/s_tui.py", line 738, in main
    loop.run()
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/urwid/main_loop.py", line 287, in run
    self._run()
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/urwid/main_loop.py", line 385, in _run
    self.event_loop.run()
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/urwid/main_loop.py", line 790, in run
    self._loop()
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/urwid/main_loop.py", line 823, in _loop
    alarm_callback()
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/urwid/main_loop.py", line 173, in cb
    callback(self, user_data)
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/s_tui/s_tui.py", line 829, in animate_graph
    self.view.update_displayed_information()
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/s_tui/s_tui.py", line 256, in update_displayed_information
    summary.update()
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/s_tui/sturwid/summary_text_list.py", line 66, in update
    for key, val in self.source.get_summary().items():
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/s_tui/sources/source.py", line 73, in get_summary
    graph_vector_summary.update(self.get_sensors_summary())
  File "/home/xeonbloomfield/.local/lib/python3.6/site-packages/s_tui/sources/source.py", line 64, in get_sensors_summary
    graph_vector_summary[sub_title_list[graph_idx]] = val_str
IndexError: list index out of range

Debug Results, output of s-tui -d created in a file _s-tui.log:

2020-01-03 13:22:11,425 [_load_config()] [DEBUG]  User refresh rate: 2.0
2020-01-03 13:22:11,425 [_load_config()] [DEBUG]  No user config for temp threshold
2020-01-03 13:22:11,430 [__init__()] [DEBUG]  Temp sensor name Acpitz,0
2020-01-03 13:22:11,430 [__init__()] [DEBUG]  Temp sensor name PackageId0
2020-01-03 13:22:11,430 [__init__()] [DEBUG]  Temp sensor name Core0,Pkg0
2020-01-03 13:22:11,430 [__init__()] [DEBUG]  Temp sensor name Core1,Pkg0
2020-01-03 13:22:11,430 [__init__()] [DEBUG]  Temp sensor name Core2,Pkg0
2020-01-03 13:22:11,430 [__init__()] [DEBUG]  Temp sensor name Core3,Pkg0
2020-01-03 13:22:11,430 [__init__()] [DEBUG]  Temp sensor name Iwlwifi,0
2020-01-03 13:22:11,430 [__init__()] [DEBUG]  Temp sensor name Pch_Cannonlake,0
2020-01-03 13:22:11,582 [__init__()] [DEBUG]  Fan sensor name thinkpad,0
2020-01-03 13:22:11,584 [__init__()] [INFO ]  num cpus 8
2020-01-03 13:22:11,616 [on_unicode_checkbox()] [DEBUG]  unicode State is True
2020-01-03 13:22:11,620 [main_window()] [DEBUG]  Pile index: 18
2020-01-03 13:22:11,627 [eval_hooks()] [DEBUG]  Evaluating hooks
2020-01-03 13:22:11,630 [update()] [INFO ]  Core id util 8.3
2020-01-03 13:22:11,630 [update()] [INFO ]  Core id util 30.4
2020-01-03 13:22:11,630 [update()] [INFO ]  Core id util 8.7
2020-01-03 13:22:11,630 [update()] [INFO ]  Core id util 4.3
2020-01-03 13:22:11,630 [update()] [INFO ]  Core id util 4.3
2020-01-03 13:22:11,630 [update()] [INFO ]  Core id util 0.0
2020-01-03 13:22:11,630 [update()] [INFO ]  Core id util 4.2
2020-01-03 13:22:11,630 [update()] [INFO ]  Core id util 4.3
2020-01-03 13:22:11,630 [update()] [INFO ]  Utilization recorded [7.6, 8.3, 30.4, 8.7, 4.3, 4.3, 0.0, 4.2, 4.3]
2020-01-03 13:22:11,632 [update()] [DEBUG]  seconds passed 0.057058095932006836
2020-01-03 13:22:11,632 [update()] [DEBUG]  watts used 3.6337700481115163
2020-01-03 13:22:11,632 [update()] [INFO ]  Joule_Used 0, seconds passed, 0
2020-01-03 13:22:11,632 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:11,632 [update()] [DEBUG]  seconds passed 0.057058095932006836
2020-01-03 13:22:11,632 [update()] [DEBUG]  watts used 1.7628699022810557
2020-01-03 13:22:11,632 [update()] [INFO ]  Joule_Used 0, seconds passed, 0
2020-01-03 13:22:11,632 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:11,632 [update()] [DEBUG]  seconds passed 0.057058095932006836
2020-01-03 13:22:11,632 [update()] [DEBUG]  watts used 0.33374755482013546
2020-01-03 13:22:11,632 [update()] [INFO ]  Joule_Used 0, seconds passed, 0
2020-01-03 13:22:11,632 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:11,632 [update()] [DEBUG]  seconds passed 0.057058095932006836
2020-01-03 13:22:11,632 [update()] [DEBUG]  watts used 0.8054772815196454
2020-01-03 13:22:11,632 [update()] [INFO ]  Joule_Used 0, seconds passed, 0
2020-01-03 13:22:11,633 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:11,633 [update()] [DEBUG]  seconds passed 0.057058095932006836
2020-01-03 13:22:11,633 [update()] [DEBUG]  watts used 10.809298661752724
2020-01-03 13:22:11,633 [update()] [INFO ]  Joule_Used 0, seconds passed, 0
2020-01-03 13:22:11,633 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:11,635 [update()] [INFO ]  Reading [48.0, 62.0, 48.0, 62.0, 47.0, 46.0, 56.0, 47.0]
2020-01-03 13:22:11,643 [update()] [INFO ]  Reading [1964.056375, 1982.328, 1971.169, 1994.392, 1956.487, 1953.392, 1905.686, 1968.38, 1980.617]
2020-01-03 13:22:11,643 [get_top()] [DEBUG]  Returning top 4600.0
2020-01-03 13:22:11,644 [update()] [INFO ]  Reading [7.6, 8.3, 30.4, 8.7, 4.3, 4.3, 0.0, 4.2, 4.3]
2020-01-03 13:22:11,645 [update()] [INFO ]  Reading [3.6337700481115163, 1.7628699022810557, 0.33374755482013546, 0.8054772815196454, 10.809298661752724]
2020-01-03 13:22:11,646 [update()] [INFO ]  Reading [0]
2020-01-03 13:22:13,661 [eval_hooks()] [DEBUG]  Evaluating hooks
2020-01-03 13:22:13,802 [update()] [INFO ]  Core id util 16.2
2020-01-03 13:22:13,802 [update()] [INFO ]  Core id util 9.2
2020-01-03 13:22:13,802 [update()] [INFO ]  Core id util 17.7
2020-01-03 13:22:13,802 [update()] [INFO ]  Core id util 8.8
2020-01-03 13:22:13,802 [update()] [INFO ]  Core id util 4.7
2020-01-03 13:22:13,802 [update()] [INFO ]  Core id util 4.7
2020-01-03 13:22:13,802 [update()] [INFO ]  Core id util 7.5
2020-01-03 13:22:13,802 [update()] [INFO ]  Core id util 7.3
2020-01-03 13:22:13,802 [update()] [INFO ]  Utilization recorded [9.6, 16.2, 9.2, 17.7, 8.8, 4.7, 4.7, 7.5, 7.3]
2020-01-03 13:22:13,803 [update()] [DEBUG]  seconds passed 2.171461820602417
2020-01-03 13:22:13,803 [update()] [DEBUG]  watts used 4.161611737429937
2020-01-03 13:22:13,803 [update()] [INFO ]  Joule_Used 9, seconds passed, 2
2020-01-03 13:22:13,803 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:13,803 [update()] [DEBUG]  seconds passed 2.171461820602417
2020-01-03 13:22:13,803 [update()] [DEBUG]  watts used 1.8460075889742946
2020-01-03 13:22:13,803 [update()] [INFO ]  Joule_Used 4, seconds passed, 2
2020-01-03 13:22:13,803 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:13,803 [update()] [DEBUG]  seconds passed 2.171461820602417
2020-01-03 13:22:13,803 [update()] [DEBUG]  watts used 0.43356276912979036
2020-01-03 13:22:13,803 [update()] [INFO ]  Joule_Used 0, seconds passed, 2
2020-01-03 13:22:13,803 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:13,803 [update()] [DEBUG]  seconds passed 2.171461820602417
2020-01-03 13:22:13,804 [update()] [DEBUG]  watts used 1.09583874670191
2020-01-03 13:22:13,804 [update()] [INFO ]  Joule_Used 2, seconds passed, 2
2020-01-03 13:22:13,804 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:13,804 [update()] [DEBUG]  seconds passed 2.171461820602417
2020-01-03 13:22:13,804 [update()] [DEBUG]  watts used 13.790723703211247
2020-01-03 13:22:13,804 [update()] [INFO ]  Joule_Used 29, seconds passed, 2
2020-01-03 13:22:13,804 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:13,805 [update()] [INFO ]  Reading [47.0, 46.0, 44.0, 45.0, 43.0, 44.0, 56.0, 50.0]
2020-01-03 13:22:13,806 [update()] [INFO ]  Reading [2999.128125, 2521.286, 3851.381, 4023.053, 4031.564, 4051.352, 4080.09, 4099.761, 4098.1]
2020-01-03 13:22:13,806 [update()] [INFO ]  Reading [9.6, 16.2, 9.2, 17.7, 8.8, 4.7, 4.7, 7.5, 7.3]
2020-01-03 13:22:13,806 [update()] [INFO ]  Reading [4.161611737429937, 1.8460075889742946, 0.43356276912979036, 1.09583874670191, 13.790723703211247]
2020-01-03 13:22:13,806 [update()] [INFO ]  Reading [0]
2020-01-03 13:22:15,823 [eval_hooks()] [DEBUG]  Evaluating hooks
2020-01-03 13:22:15,977 [update()] [INFO ]  Core id util 8.5
2020-01-03 13:22:15,977 [update()] [INFO ]  Core id util 8.5
2020-01-03 13:22:15,977 [update()] [INFO ]  Core id util 15.0
2020-01-03 13:22:15,977 [update()] [INFO ]  Core id util 13.5
2020-01-03 13:22:15,977 [update()] [INFO ]  Core id util 9.7
2020-01-03 13:22:15,978 [update()] [INFO ]  Core id util 11.9
2020-01-03 13:22:15,978 [update()] [INFO ]  Core id util 7.9
2020-01-03 13:22:15,978 [update()] [INFO ]  Core id util 8.0
2020-01-03 13:22:15,978 [update()] [INFO ]  Utilization recorded [10.5, 8.5, 8.5, 15.0, 13.5, 9.7, 11.9, 7.9, 8.0]
2020-01-03 13:22:15,980 [update()] [DEBUG]  seconds passed 2.177027702331543
2020-01-03 13:22:15,980 [update()] [DEBUG]  watts used 5.276474427816272
2020-01-03 13:22:15,981 [update()] [INFO ]  Joule_Used 11, seconds passed, 2
2020-01-03 13:22:15,981 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:15,981 [update()] [DEBUG]  seconds passed 2.177027702331543
2020-01-03 13:22:15,981 [update()] [DEBUG]  watts used 1.891668165540335
2020-01-03 13:22:15,981 [update()] [INFO ]  Joule_Used 4, seconds passed, 2
2020-01-03 13:22:15,981 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:15,981 [update()] [DEBUG]  seconds passed 2.177027702331543
2020-01-03 13:22:15,982 [update()] [DEBUG]  watts used 0.4381175301437415
2020-01-03 13:22:15,982 [update()] [INFO ]  Joule_Used 0, seconds passed, 2
2020-01-03 13:22:15,982 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:15,982 [update()] [DEBUG]  seconds passed 2.177027702331543
2020-01-03 13:22:15,982 [update()] [DEBUG]  watts used 1.2344500702135424
2020-01-03 13:22:15,982 [update()] [INFO ]  Joule_Used 2, seconds passed, 2
2020-01-03 13:22:15,982 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:15,983 [update()] [DEBUG]  seconds passed 2.177027702331543
2020-01-03 13:22:15,983 [update()] [DEBUG]  watts used 23.49018569916755
2020-01-03 13:22:15,983 [update()] [INFO ]  Joule_Used 51, seconds passed, 2
2020-01-03 13:22:15,983 [update()] [INFO ]  Power reading elapsed
2020-01-03 13:22:15,988 [update()] [INFO ]  Reading [52.0, 44.0, 46.0, 45.0, 47.0, 44.0, 44.0, 56.0, 53.0]
2020-01-03 13:22:15,990 [update()] [INFO ]  Reading [804.7864999999999, 795.421, 799.432, 796.72, 797.998, 798.202, 799.427, 799.673, 765.867]
2020-01-03 13:22:15,991 [update()] [INFO ]  Reading [10.5, 8.5, 8.5, 15.0, 13.5, 9.7, 11.9, 7.9, 8.0]
2020-01-03 13:22:15,992 [update()] [INFO ]  Reading [5.276474427816272, 1.891668165540335, 0.4381175301437415, 1.2344500702135424, 23.49018569916755]
2020-01-03 13:22:15,994 [update()] [INFO ]  Reading [0]

Step 3: Reproduce the problem:

Steps to reproduce:

  1. Make sure AMD Radeon GPU temperature is not reported (N/A in sensors)
  2. Open s-tui 1.0.0b2
  3. Run application using dedicated GPU (temperature is now being reported)

Sensors output with Radeon GPU disabled:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +48.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +47.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +47.0°C  (high = +100.0°C, crit = +100.0°C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1:           0 RPM

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx:           N/A
temp1:            N/A  (crit = +94.0°C, hyst = -273.1°C)
power1:           N/A  (cap =  25.00 W)

Sensors output with Radeon GPU enabled:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +48.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +48.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +47.0°C  (high = +100.0°C, crit = +100.0°C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1:           0 RPM

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx:       +1.07 V  
temp1:        +48.0°C  (crit = +94.0°C, hyst = -273.1°C)
power1:        7.05 W  (cap =  25.00 W)

If you need any help resolving this issue, let me know.

amanusk commented 4 years ago

Interesting but. Thanks. The problem is that at the moment there is poor support for the number of sensors changing during the operation. Basically we assume that the number of sensors on init will remain the same, but interestingly, the GPU is being toggled on and off depending on the running application. This might take some time to fix, to properly be ready for changing sensors per measurement.

XeonBloomfield commented 4 years ago

Right now version 1.0.0b (as well as the latest code from master branch) is unusable on that hardware when this situation occurs, however 0.8.3 works great but lacks new features.

What comes to my mind as a quick fix for this problem:

This prevents s-tui from crashing and gives time to do handling of changing sensors per measurement later. If you like that solution, I can do pull request with implementation.

amanusk commented 4 years ago

0.8.3 should be working, as it only has a single sensor measured for each category. So this makes sense. I think your solution is a good one for a quick fix. For the longer term, we should move away from using a list for the subgraphs and use dictionaries instead. That way only entries with existing values can be populated, with an optional button to refresh the entire view. Please feel free to open a PR, and ask anything I might assists you with.

HughWarrington commented 4 years ago

Related? https://bugs.launchpad.net/ubuntu/+source/s-tui/+bug/1882846

amanusk commented 4 years ago

Thanks @HughWarrington. Yes this is related. s-tui does not handle a change in the number of sensors during operation gracefully. I'll do my best to look into it promptly as it appears more widespread that I initially thought.

amanusk commented 4 years ago

@HughWarrington, I have pushed an update that should fix this bug. There is more work to do, but it should at least prevent the crashes. Can you please try running from source or installing version 1.0.1 from pip and see if it fixes the issue for you?

sar commented 4 years ago

@amanusk Thanks for the amazing work squashing thread crashes. Just tested 1.0.1 from pip and it's been very stable in handling sensors hot swap operations.

HughWarrington commented 4 years ago

@amanusk many thanks for the quick response. I've tried running 1.0.1 from pip, and no problems so far after running for a day in the same environment as before.