Arc-Compute / LibVF.IO

A vendor neutral GPU multiplexing tool driven by VFIO & YAML.
GNU Affero General Public License v3.0
822 stars 39 forks source link

Can't get LibVF to pick up after reboot #63

Closed jon-bit closed 1 year ago

jon-bit commented 1 year ago

I'm using fedora 37 and when I install LibVF I reboot and then the script does NOT pickup where I left off. It starts the process all over again. I have no clue what I'm doing wrong and I've tried fixing it for days. Can anyone help? I'm puting a pastbin of the output below.

https://pastebin.com/BaeC1jUv

Any help is nice so thank you in advance.

arthurrasmusson commented 1 year ago

Hey @jon-bit. Thanks for pointing this out - I'll try my best to help you with whatever is going on here.

Can you start by running ./scripts/generate-debug-information.sh then dump the resulting ./logs/debug.log for me?

Thanks.

jon-bit commented 1 year ago

I ran it again because I rebooted and saw this

Failed to disable unit: Unit file nvidia-vgpud.service does not exist.
Failed to stop nvidia-vgpud.service: Unit nvidia-vgpud.service not loaded.

but if that log is needed Here it is

https://pastebin.com/PChqHd3u

arthurrasmusson commented 1 year ago

Delete your debug.log file then run sudo systemctl restart gvm-post.service after reboot. If you can re-run the debug log script and post it after doing that I'll take a look.

I suspect this is your issue: https://github.com/Open-IOV/GVM-user/issues/3

You can also reach me in the Open-IOV Community Group Chat - I'll probably be able to help you more quickly in there: https://discord.gg/Rb9K9DYxKK

jon-bit commented 1 year ago

I ran the restart on gvm and got

Failed to restart gvm-post.servic.service: Unit gvm-post.servic.service not found.

Here is the output of debug.log

https://pastebin.com/tw3fsWa8

mbuchel commented 1 year ago

you typed in the restart command wrong, try just:

sudo systemctl restart gvm-post

it should autocomplete for you

mbuchel commented 1 year ago

also you have a 3070, the support for ampere is still super experimental

jon-bit commented 1 year ago

OK I just saw this in the logs

Device Check Test succeeded
Device Major Number Identifier Null succeeded
Device Major Number Identifier succeeded
DeviceFileMode Nvidia Check succeeded
Open /dev/nvidiactl File succeeded
Open /dev/nvidia0 File succeeded
All tests succeeded
# GVM-user TEST-DEVICE END
# GVM-user TEST-NVIDIA-API START
RM Version Check Ok: Ensure your version check is correctly: 510.47.03
RM Version Check Incorrect: Invalid version is reported correctly, please check your driver version.
RM Version Check Ignore succeeded
RM Alloc Root Ok succeeded
RM Alloc Root Fail (Invalid FD) succeeded
RM Alloc Root Fail (Bad FD) succeeded
RM Alloc Root Fail (Bad Argument) succeeded
RM Free Root Ok succeeded
RM Free Root Fail (Invalid FD) succeeded
RM Free Root Fail (Double Deallocate) succeeded
Get Probed Ids succeeded
2/11 tests failed
# GVM-user TEST-NVIDIA-API END
# GVM-user TEST-NVIDIA-MANAGER START
Created gpu: 0x00000100 (0x10DE, 0x2482, 0x10DE, 0x5052)
Failed RM Control Mechanism:
    client: 0xC1D00046
    object: 0xD0014603
    cmd: 0xA0810101
    flags: 0x00000000
    params: 0x7fff9a52c380
    size: 0x000011C0
    status: 0x0000003A
Destroyed gpu: 0x00000100 (0x10DE, 0x2482, 0x10DE, 0x5052)
Create MDevs succeeded
All tests succeeded
# GVM-user TEST-NVIDIA-MANAGER END
# UNAME START
Linux fedora 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 GNU/Linux
# UNAME END
# CPUINFO START
mbuchel commented 1 year ago

you are using 525.60, the libvf.io scripts may not have the update to support this yet, but it is kindof moot as vgpu unlock does not support 3070, the gvm suite will but it is not yet there. feel free to join the open-iov discord.

jon-bit commented 1 year ago

Sorry but I don't have discord. Regardless how do we fix this?

mbuchel commented 1 year ago

you can install gvm-user utils and select nvidia/525.60 branch to compile it

https://github.com/Open-IOV/GVM-user/tree/nvidia/525

the issue will be handling the vgpu unlock for 3070 series, which is much more complex

arthurrasmusson commented 1 year ago

@jon-bit See GPU Support on Open-IOV: https://open-iov.org/index.php/GPU_Support

arthurrasmusson commented 1 year ago

@jon-bit as this issue appears to originate from an unsupported device I'm going close this thread for now.

jon-bit commented 1 year ago

I'll respect the fact that this is not a supported card but do you know of any other software that could help with GPU virtualization?