Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
139 stars 23 forks source link

ValueError: invalid literal for int() with base 10 at GPUmodules.py line 1236 in read_gpu_pstates #139

Closed PorcelainMouse closed 10 months ago

PorcelainMouse commented 1 year ago

Sorry, but I'm not quite sure what I'm seeing here. Previously, gpu-ls worked, but, after enabling writing to the card (w/ kernel param amdgpu.ppfeaturemask=... as directed) I get this error running gpu-ls & gpu-pac.

$ gpu-ls
Detected GPUs: AMD: 1
AMD: Wattman features enabled: 0xfffd7fff
Total of 1 GPU: 1 is rw, 0 are r-only, and 0 are w-only

Traceback (most recent call last):
  File "/home/pdestefa/.local/bin/gpu-ls", line 174, in <module>
    main()
  File "/home/pdestefa/.local/bin/gpu-ls", line 149, in main
    gpu_list.read_gpu_pstates()
  File "/home/pdestefa/.local/lib/python3.11/site-packages/GPUmodules/GPUmodule.py", line 2503, in read_gpu_pstates
    gpu.read_gpu_pstates()
  File "/home/pdestefa/.local/lib/python3.11/site-packages/GPUmodules/GPUmodule.py", line 1236, in read_gpu_pstates
    lineitems[0] = int(re.sub(':', '', lineitems[0]))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
PorcelainMouse commented 1 year ago

Oops. 'b', not 'd'.

Ricks-Lab commented 1 year ago

Oops. 'b', not 'd'.

I would like to better understand the error so that I can improve error checking. Are you saying you incorrectly entered the feature mask with a b instead of a d?

Ricks-Lab commented 1 year ago

I was able to duplicate this issue and am working on improved handling of the issue. Thanks for reporting your observation!

qwertychouskie commented 10 months ago

I have this issue (or very similar) as well, not sure what to do.

qwerty@qwerty-asus-g14:~$ gpu-ls --debug
Ubuntu: Validated
Detected GPUs: NVIDIA: 1, AMD: 1
AMD: amdgpu/rocm version: UNKNOWN
AMD: Wattman features not enabled: 0xfff7bfff, See README file.
### read_time_val: 13-Nov-2023 06:01:19
model_display: True:  GA106M GeForce RTX
loading: True: None
mem_loading: True: None
mem_vram_usage: True: None
mem_gtt_usage: True: None
power: True: None
power_cap: True: None
energy: True: 0.0
temp_val: True: None
vddgfx_val: True: nan
fan_pwm: True: None
sclk_f_val: True: None
sclk_ps_val: True: 
mclk_f_val: True: None
mclk_ps_val: True: 
ppm: True: 

### read_time_val: 13-Nov-2023 06:01:19
model_display: True:  Cezanne Vega Series
loading: True: None
mem_loading: True: None
mem_vram_usage: True: None
mem_gtt_usage: True: None
power: True: 21.0
power_cap: True: None
energy: True: 1e-06
temp_val: True: 53.0
vddgfx_val: True: 1400
fan_pwm: True: None
sclk_f_val: True: 400Mhz
sclk_ps_val: True: 1
mclk_f_val: True: 1600Mhz
mclk_ps_val: True: 3
ppm: True: 

Total of 2 GPUs: 0 are rw, 1 is r-only, and 0 are w-only

Traceback (most recent call last):
  File "/usr/bin/gpu-ls", line 174, in <module>
    main()
  File "/usr/bin/gpu-ls", line 149, in main
    gpu_list.read_gpu_pstates()
  File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 2503, in read_gpu_pstates
    gpu.read_gpu_pstates()
  File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1236, in read_gpu_pstates
    lineitems[0] = int(re.sub(':', '', lineitems[0]))
                                       ~~~~~~~~~^^^
IndexError: list index out of range
Ricks-Lab commented 10 months ago

@qwertychouskie

It is slightly different. Are you running the latest release on PyPI? I have fixed a similar problem in that release, but your issue is slightly different. Can you uninstall the version you are using and install the latest from PyPI? Also, a copy of the debug logfile would be useful. Thanks!

qwertychouskie commented 10 months ago

OK, I saw that the update should fix it, so I updated all the appropriate .py files, but got a weirder error:

qwerty@qwerty-asus-g14:~$ gpu-ls --debug
Ubuntu: Validated
Detected GPUs: NVIDIA: 1, AMD: 1
AMD: amdgpu/rocm version: UNKNOWN
AMD: Wattman features not enabled: 0xfff7bfff, See README file.
### read_time_val: 13-Nov-2023 06:20:02
model_display: True:  GA106M GeForce RTX
loading: True: None
mem_loading: True: None
mem_vram_usage: True: None
mem_gtt_usage: True: None
power: True: None
power_cap: True: None
energy: True: 0.0
temp_val: True: None
vddgfx_val: True: nan
fan_pwm: True: None
sclk_f_val: True: None
sclk_ps_val: True: 
mclk_f_val: True: None
mclk_ps_val: True: 
ppm: True: 

### read_time_val: 13-Nov-2023 06:20:03
model_display: True:  Cezanne Vega Series
loading: True: None
mem_loading: True: None
mem_vram_usage: True: None
mem_gtt_usage: True: None
power: True: 22.0
power_cap: True: None
energy: True: 1e-06
temp_val: True: 61.0
vddgfx_val: True: 1387
fan_pwm: True: None
sclk_f_val: True: 400Mhz
sclk_ps_val: True: 1
mclk_f_val: True: 1600Mhz
mclk_ps_val: True: 3
ppm: True: 

Total of 2 GPUs: 0 are rw, 1 is r-only, and 0 are w-only

Traceback (most recent call last):
  File "/usr/bin/gpu-ls", line 179, in <module>
    main()
  File "/usr/bin/gpu-ls", line 173, in main
    gpu_list.print()
  File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 2590, in print
    else: gpu.print()
          ^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1887, in print
    if param_name in GpuItem.amd_type_skip_lists[self.prm.gpu_type]:
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: Supported
qwerty@qwerty-asus-g14:~$

Commenting out lines 1887 and 1888 seem to let the program run properly.

Ricks-Lab commented 10 months ago

@qwertychouskie Thanks for your detailed feedback. I have made a change to the code in the repository. Let me know if this works. If no issue, I will push to PyPI.

qwertychouskie commented 10 months ago

Seems good on my end.

Might want to tag a new release on GitHub as well, so the fixes get picked by distros.

Ricks-Lab commented 10 months ago

Seems good on my end.

Might want to tag a new release on GitHub as well, so the fixes get picked by distros.

I want to double check my implementation of Enum objects as dict keys before I release for distro update. I will update to PyPI when work is complete for additional testing.

Ricks-Lab commented 10 months ago

Confirmed fixed in v3.8.4.