Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
139 stars 23 forks source link

Q: had it been tested w/ latest Debian bpo Kernel? #157

Open RogerWeihrauch opened 5 months ago

RogerWeihrauch commented 5 months ago

Sorry, hit Enter too early.

So, actually I am running the latest Debian 12.5 release with bpo Kernel: uname -a Linux lildeb 6.6.13+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.13-1~bpo12+1 (2024-02-15) x86_64 GNU/Linux And installed it w/ others: root@lildeb:~# apt install ricks dkms linux-source perl curl git-all jq -t bookworm-backports executing 'gpu-ls' shows following error: root@lildeb:~# gpu-ls Traceback (most recent call last): File "/usr/bin/gpu-ls", line 174, in main() File "/usr/bin/gpu-ls", line 121, in main if env.GUT_CONST.check_env() < 0: ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/GPUmodules/env.py", line 312, in check_env current_kversion = tuple([int(x) for x in re.sub('-.', '', current_kversion_str).split('.')]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/GPUmodules/env.py", line 312, in current_kversion = tuple([int(x) for x in re.sub('-.*', '', current_kversion_str).split('.')]) ^^^^^^ ValueError: invalid literal for int() with base 10: '13+bpo' root@lildeb:~#

So, anyone else made this exp.? Or, only error on my side? If so, howto further debug/find the source of this?

Thanks for your effort on this. Kind regards, Roger

RogerWeihrauch commented 5 months ago

.. donno why all is lined-out. Roger

Ricks-Lab commented 5 months ago

This looks like a problem that I have already fixed. I think the debian package still has not been updated. Can you try a PyPI or rickslab.com install to verify? See README for details.

RogerWeihrauch commented 5 months ago

Hi Rick Thanx for resp. Ok, tried: 1st) w/ ricklab Debian install, as in: https://github.com/Ricks-Lab/gpu-utils/blob/master/docs/USER_GUIDE.md#rickslabcom-debian-installation Install suceeded AFTER a complete 'apt remove --purge ricks-', AND a 'apt autoremove' + reboot. BUT, execution here shows: root@lildeb:~# gpu-ls Traceback (most recent call last): File "/usr/bin/gpu-ls", line 179, in main() File "/usr/bin/gpu-ls", line 126, in main if GUT_CONST.check_env() < 0: ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/GPUmodules/env.py", line 313, in check_env current_kversion = tuple([int(x) for x in re.sub('-.', '', current_kversion_str).split('.')]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/GPUmodules/env.py", line 313, in current_kversion = tuple([int(x) for x in re.sub('-.*', '', current_kversion_str).split('.')]) ^^^^^^ ValueError: invalid literal for int() with base 10: '13+bpo' root@lildeb:~# ( so still the same) Will try also the PiPy way and report back. (Takes some time)

Regards, Roger

Ricks-Lab commented 5 months ago

Looks like bpo in the kernel version is causing the problem. Should be an easy fix. Will make a change tomorrow.

RogerWeihrauch commented 5 months ago

Hi Rick Thanx for resp. Ok, tried: 1st) w/ ricklab Debian install, as in: https://github.com/Ricks-Lab/gpu-utils/blob/master/docs/USER_GUIDE.md#rickslabcom-debian-installation Install suceeded AFTER a complete 'apt remove --purge ricks-', AND a 'apt autoremove' + reboot. BUT, execution here shows: root@lildeb:~# gpu-ls Traceback (most recent call last): File "/usr/bin/gpu-ls", line 179, in main() File "/usr/bin/gpu-ls", line 126, in main if GUT_CONST.check_env() < 0: ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/GPUmodules/env.py", line 313, in check_env current_kversion = tuple([int(x) for x in re.sub('-.', '', current_kversion_str).split('.')]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/GPUmodules/env.py", line 313, in current_kversion = tuple([int(x) for x in re.sub('-.*', '', current_kversion_str).split('.')]) ^^^^^^ ValueError: invalid literal for int() with base 10: '13+bpo' root@lildeb:~# ( so still the same) Will try also the PiPy way and report back. (Takes some time)

.. so this is the result: debian@lildeb:~$ . ./vEnv4GPUutils/bin/activate (vEnv4GPUutils) debian@lildeb:~$ gpu-ls Traceback (most recent call last): File "/home/debian/vEnv4GPUutils/bin/gpu-ls", line 179, in main() File "/home/debian/vEnv4GPUutils/bin/gpu-ls", line 126, in main if GUT_CONST.check_env() < 0: ^^^^^^^^^^^^^^^^^^^^^ File "/home/debian/vEnv4GPUutils/lib/python3.11/site-packages/GPUmodules/env.py", line 313, in check_env current_kversion = tuple([int(x) for x in re.sub('-.', '', current_kversion_str).split('.')]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/debian/vEnv4GPUutils/lib/python3.11/site-packages/GPUmodules/env.py", line 313, in current_kversion = tuple([int(x) for x in re.sub('-.', '', current_kversion_str).split('.')]) ^^^^^^ ValueError: invalid literal for int() with base 10: '13+bpo' (vEnv4GPUutils) debian@lildeb:~$

PS: exchanging the 'tuple([int(x) ...' w/ 'tuple([str(x) ...' is no option here?

Regards, Roger

Ricks-Lab commented 5 months ago

I have a version that should work, but can you run your current version with --debug option and post the contents of the log file here?

The fixed version is in the branch keys_30mar24

Ricks-Lab commented 5 months ago

PS: exchanging the 'tuple([int(x) ...' w/ 'tuple([str(x) ...' is no option here?

I was handling the case of strings ate end of kernel version, but for your case, the string is not at the end. The logfile output should help in creating a more robust approach.

merkys commented 5 months ago

This issue seems to have also been reported on the Debian package as #1070783. The Debian bug report also suggests a fix. Can you give it a look?

Ricks-Lab commented 4 months ago

I have a fix completed for this and a few other improvements. But I have not made a release yet. I probably need a week or two to finalize a release.

On Wed, May 8, 2024 at 10:47 PM Andrius Merkys @.***> wrote:

This issue seems to have also been reported on the Debian package as

1070783 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1070783. The

Debian bug report also suggests a fix. Can you give it a look?

— Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/gpu-utils/issues/157#issuecomment-2101982651, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2ABJFOPJ5LS3QNGHTSRQDZBMEWBAVCNFSM6AAAAABGAEL2LGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRHE4DENRVGE . You are receiving this because you commented.Message ID: @.***>

merkys commented 4 months ago

Great, thanks!

Ricks-Lab commented 4 months ago

@RogerWeihrauch I have released a version that should address this issue on PyPI: 3.9.0rc1

Please remove all other versions of rickslab-gpu-utils and pip install RC1: pip install rickslab-gpu-utils==3.9.0rc1

Ricks-Lab commented 4 months ago

I have released latest release candidate on PyPI: 3.9.0rc2

Please remove all other versions of rickslab-gpu-utils and pip install RC2: pip install rickslab-gpu-utils==3.9.0rc2

Ricks-Lab commented 4 months ago

Great, thanks! @merkys

I have published v3.9.0 release: https://github.com/Ricks-Lab/gpu-utils/releases/tag/v3.9.0