Arc-Compute / LibVF.IO

A vendor neutral GPU multiplexing tool driven by VFIO & YAML.
GNU Affero General Public License v3.0
822 stars 39 forks source link

3070 Ti available on developer branch? #66

Closed jon-bit closed 1 year ago

jon-bit commented 1 year ago

I have a 3070 Ti (Nvidia) on my pc and I want to set up LibVF.IO for a collage project but I know it's not supported, YET. I see that they are in development for making 30** work but I don't know if that means I can download the devel branch and it would "work". Is that technically what I can do or do I have to wait (because I need to get it working for the school protect but I only have so much money left in my funding for my capstone)?

EDIT: I would like to add I'm using fedora 37

arthurrasmusson commented 1 year ago

For your college project I'd recommend using a supported GPU unless it's on the topic of virtualization. Both Nvidia's proprietary GRID VGX driver, and their open source refactor of VGX place some elements of what is used in the virtualization software now in their GPU VBIOS (embedded firmware) which differs between graphics cards released containing the GA102 chip upon which your 3070 Ti is based. That's possible to overcome however work to support that device has involved going "back to the drawing board".

As for Fedora 37 (which I believe is based on kernel 6.0, correct me if I'm wrong) that should be supported by most devices. Intel's virtualization driver is upstream for ADL-P+ in kernel 6.1 so if you have any Alder Lake or newer iGPU you can make use of SR-IOV without installing drivers. Support on some of their dGPUs requires a DKMS module.

jon-bit commented 1 year ago

I need Virtual GPUs in order for it to work. It's based on Computer security with compatibility and connivance. LibVF.IO is supposed to the gaming/Graphic define version, While Fedora is the host and more secure with compatibility. It's suppose to be a "all use computer" with the highest security with the most convenience. Regardless I'm just am asking if the devel branch would work with it.

GrandtheUK commented 1 year ago

LibVF.IO for the nvidia cards uses the nvidia merged driver which is not in this repository. That driver does not currently work for 30 series cards and until it does LibVF.IO will not work at all as the VGPU functionality will not be accessible on consumer cards. The devel branch of this repository tests new features for the arcd program NOT the merged driver.

If you want to track the progress of the 30 series cards merged driver support you can join the vgpu unlock discord server which can be found on the vgpu unlock tool github linked in this repositorys README file. HOWEVER, asking them if 30 series is supported will not make it go faster and it may take some time and bugging them about it won't make it go any faster so please keep that in mind.

As Arthur said if you want to make use of this tools and the nvidia merged driver now for your project, then see if you can't get a hold of a cheap supported GPU as linked to by Arthur. There are some cards to stay away from due to some weirdness in their VRAM configuration but most of the lower end of cards should be able to be bought fairly cheaply to demo your project. For instance I have managed to make my Nvidia GTX 1060 6GB work for at least 1 GPU accelerated VM and ran games on it.

Something else to consider is that the LIBVF.IO project simply makes use of the existing vgpu stack within the kernel and supported drivers as well as the looking glass project and qemu. So it might be better for your project to read through the documentation for these and refer to this codebase to see how things work in practice rather than build ontop

arthurrasmusson commented 1 year ago

@jon-bit If you're interested to know more about the Nvidia merged driver I'd recommend you checkout the following: https://github.com/VGPU-Community-Drivers/vGPU-Unlock-patcher

What I can say is that supporting this device can be done using that tool with continued work (unassisted Virtual) or another that doesn't involved unmodified guest drivers (such as GPU Paravirtualization in Hyper-V[GPU-P], or Virgl). As @GrandtheUK mentioned there the arcd program simply runs virtual machines while gvm-cli sends IOCTLs to configure the nvidia-vgpu-mgr service.

In terms of folks working to get that bin posted on unsupported Ampere platforms I can say from my knowledge of the existing work the following:

•Many folks have contributed hundreds of hours to development time across the community

•The Nvidia driver branches vGPU logic in several directions depending on the device being virtualized - the most significant of that branching appears to be for pre and post Ampere logic.

•In general post Ampere support has changes in the device firmware, VMIOP, and vgpu-mgr around how interrupts work, as well as how memory allocation works (SRIOVHeavyMode).

•While Ampere consumer guests may not initialize correctly without reflashing (which also causes video output to fail even on officially supported devices) it may be possible to initialize them using existing software functions contained within libnvidia-vgpu.so without the requirement to reflash the device (this would remove the use of posted interrupts, and require more software MMIO remapping compared to normal Ampere guests)

•At the moment guest drivers do not serialize data to the host’s RM engines on Ampere Consumer devices entirely correctly. Some RM engines appear to fail in part due to their use of incorrect RPC serialization (pRPCs for non-SRIOV rather than vRPCs)

•The driver does now unload with a clean exit (as opposed to a crash which we had been seeing until recently).

Since this repository itself isn't where this work is being done I'm going to lock this issue for now, but if you want to get in touch with me to get my thoughts on how this approach or other approaches to related subjects might work on Discord please feel free to do so.