Jip-Hop / jailmaker

Persistent Linux 'jails' on TrueNAS SCALE to install software (k3s, docker, portainer, podman, etc.) with full access to all files via bind mounts thanks to systemd-nspawn!
GNU Lesser General Public License v3.0
518 stars 43 forks source link

AMD GPU passthrough support #109

Closed gardenali closed 2 months ago

gardenali commented 6 months ago

It's great to have intel and nvidia support, but I'm missing the AMD option.

Thank you!

lks-hrsch commented 6 months ago

I am currently also at the point that I need amd gpu support to move my last service to jailmaker. Can someone explain what is needed to enable gpu support or what you have done for intel and nvidia? I am open to implementing and evaluating the feature on my system.

easyfab commented 6 months ago

Does it work if you manually add --bind=/dev/dri ?

e.g : jlmkr create myjail --bind=/dev/dri

jeefberkey commented 6 months ago

The intel gpu setting does just that

Jip-Hop commented 6 months ago

Jailmaker has intel and nvidia GPU support because these drivers are provided by the TrueNAS SCALE host OS. I think adding support for a dedicated AMD GPU in jailmaker is not trivial, if possible at all without modifying the host OS. Since I have no dedicated GPU in my TrueNAS server I can't investigate this. Feel free to investigate though. @lks-hrsch you could have a look at the python code of jlmkr.py to see what the intel and nvidia GPU passthrough options do.

easyfab commented 6 months ago

Isn't AMD support in Truenas Host OS ?

lspci -k | grep amdgpu Kernel driver in use: amdgpu Kernel modules: amdgpu

For info, I tried with 5700G APU, adding --bind=/dev/dri seems to give me access to the igpu in jailmaker. Don't know if it work with dgpu.

edit : to complete

root@myjail:~# vainfo error: can't connect to X server! libva info: VA-API version 1.17.0 libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so libva info: Found init function __vaDriverInit_1_17 libva info: va_openDriver() returns 0 vainfo: VA-API version: 1.17 (libva 2.12.0) vainfo: Driver version: Mesa Gallium driver 22.3.6 for AMD Radeon Graphics (renoir, LLVM 15.0.6, DRM 3.54, 6.6.16-production+truenas) vainfo: Supported profile and entrypoints VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileVC1Simple : VAEntrypointVLD VAProfileVC1Main : VAEntrypointVLD VAProfileVC1Advanced : VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice VAProfileH264Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointEncSlice VAProfileH264High : VAEntrypointVLD VAProfileH264High : VAEntrypointEncSlice VAProfileHEVCMain : VAEntrypointVLD VAProfileHEVCMain : VAEntrypointEncSlice VAProfileHEVCMain10 : VAEntrypointVLD VAProfileHEVCMain10 : VAEntrypointEncSlice VAProfileJPEGBaseline : VAEntrypointVLD VAProfileVP9Profile0 : VAEntrypointVLD VAProfileVP9Profile2 : VAEntrypointVLD VAProfileNone : VAEntrypointVideoProc

image

Jip-Hop commented 6 months ago

Yes I think there's a difference between AMD iGPU and dGPU but I'd be happy to be proven wrong.

Jip-Hop commented 6 months ago

I've labeled this issue as invalid and help wanted. I think iGPU is already supported (Intel or AMD). You'd have to set gpu_passthrough_intel=1 in your config file for that. I realize now this naming is confusing in this case...

Regarding AMD dedicated GPUs, as far as I know those aren't supported on the SCALE host system and therefore jailmaker can't support them either.

Since I don't have an AMD GPU I could use help to confirm this issue is indeed invalid. Either way, I won't be working on a solution for this issue and I recommend to either switch to an nvidia GPU or implement a solution (if possible) and provide a pull request.

lks-hrsch commented 5 months ago

I apologize for not getting back to you sooner, but I can prove that for AMD iGPU it's already working, a dGPU I also currently don't have for testing.

maeehart commented 5 months ago

Hey! I just want to add that that passing the AMD gpu does work, but one needs to also bind /dev/kfd. That is, the command like ./jlmkr.py create --distro=ubuntu --release=jammy ubuntu --bind=/dev/dri --bind=/dev/kfd works. After this, I just had a bit of problems with permissions on these files. However, now I can run llama3 in a jail using an AMD GPU.

Jip-Hop commented 5 months ago

@maeehart is that a dedicated AMD GPU you're using (which model)? If so then that's good news and we can close this ticket as completed.

maeehart commented 5 months ago

Yes, it is a 6900 xt, i.e., a dedicated AMD GPU. We could still make a PR regarding the AMD support so that one could just ues the GPU by adding a gpu_passthrough_amd flag.

Jip-Hop commented 5 months ago

Ah yes that's a good idea. Could you provide the PR?

maeehart commented 5 months ago

I can do it during the weekend. I will need to see if I can do something about the permissions.

dalgibbard commented 4 months ago

Though you didn't specify what your permission issues are; is it fixed if you add: --property=DeviceAllow="/dev/kfd rw" ?

I ask, since I do something similar for CoralTPU passthrough, which looks like:

--bind='/dev/ttyUSB0'
--property=DeviceAllow="/dev/ttyUSB0 rwm"
--property=DeviceAllow="char-drm rwm"
--property=DeviceAllow=/dev/bus/usb

Though I haven't spent enough time in the land of nspawn to really work out if all of these are necessary/correct lol

Edit: This made me want to go look up what "rwm" is vs just "rw", and the "m" means:

"m" (Mknod): Allows the creation of device nodes using mknod. Device nodes are special files in Unix-like operating systems that represent device interfaces. With this permission, the container can create new device nodes within its filesystem, enabling access to devices that were not initially available. This is useful for dynamically creating device nodes as needed by containerized applications.

Jip-Hop commented 4 months ago

Please have a look in the jlmkr.py code and search for for DeviceAllow. I think adding this explicitly will actually cause issues instead of solving them.

jere-co commented 3 months ago

Any updates on the AMD dGPU support?

Jip-Hop commented 3 months ago

Hey! I just want to add that that passing the AMD gpu does work, but one needs to also bind /dev/kfd. That is, the command like

@maeehart are you sure it was the AMD GPU being used (and not the one in the CPU because you also added --bind=/dev/dri)? I assume the AMD GPU should be usable without --bind=/dev/dri, at least this is the case for an NVIDIA GPU. Which commands did you run in the jail to test the AMD GPU?

I have an AMD RX 580 GPU in a test TrueNAS server but couldn't yet get it working in an ubuntu jail. I tried debugging with mpv --hwdec=auto video_filename from this arch resource.

maeehart commented 3 months ago

Hey! I am sure that it is the AMD GPU. I have been now running ollama in the jail and confirming that the GPU is running with watching rocm-smi command for some time. However, I have not had the time to do it again so that I could add the proper scrip to this repo and I am sorry about that. I remember that I had to bind both /dev/dri and /dev/fkd and then modify their rights to allow writing to these files (chmod ...).

Jip-Hop commented 3 months ago

Instead of messing with permissions of /dev/kfd can't you run the process in your jail under the same user/group which already owns /dev/kfd?

maeehart commented 3 months ago

I think that that is a much better idea.

Jip-Hop commented 2 months ago

Reportedly AMD GPU passthrough works:

 ./jlmkr.py create --distro=ubuntu --release=jammy ubuntu --bind=/dev/dri --bind=/dev/kfd

Adding a dedicated AMD GPU passthrough config option, with corresponding flag for the create command, seems like overkill when a single additional bind mount is enough (especially since AMD GPU passthrough reportedly relies on /dev/dri being mounted which gpu_passthrough_intel=1 takes care of).

tyvsmith commented 2 months ago

Suggest documenting this more widely, like in primary readme, or revisiting the decision not to include a flag (even if it's simple). I started investigating jailmaker to handle my k3s -> Docker conversion for Truenas scale, and it wasn't clear if AMD GPU passthrough were supported at all until finding this ticket and reading comments.

Jip-Hop commented 2 months ago

Updated the readme!