ValveSoftware / SteamOS

SteamOS community tracker
1.58k stars 70 forks source link

Steam deck needs ROCm libraries #971

Closed DuckersMcQuack closed 1 year ago

DuckersMcQuack commented 1 year ago

Your system information

Please describe your issue in as much detail as possible:

Describe what you expected should happen and what did happen.

I want to request ROCm libraries so we can effectively have gpu hardware accel with HIIP for blender, as well as hardware acceleration for AMD gpu with stable diffusion, as both tasks will most likely be quite faster than cpu

kisak-valve commented 1 year ago

In the dependency tree: rocm-clang-ocl 5.4.1-2 (https://archlinux.org/packages/community-testing/x86_64/rocm-clang-ocl/) depends on rocm-llvm 5.4.2-2 (https://archlinux.org/packages/community-testing/x86_64/rocm-llvm/) Also rocm-opencl-runtime 5.4.2-1 (https://archlinux.org/packages/community-testing/x86_64/rocm-opencl-runtime/) depends on rocm-cmake 5.4.2-1 (https://archlinux.org/packages/community-testing/x86_64/rocm-cmake/) depends on rocm-llvm 5.4.2-2.

The rocm-llvm package is noted as having an installed size of 2.9 GB. This is gratuitously oversized for something that would benefit a small subset of desktop users and no games.

Sorry, this render path would need to lose a lot of disk usage for it to be a viable consideration for the Steam Deck's OS base install. The alternative would be to get ROCm packaged in a Flatpak to be installed as needed by the user and used with Blender in a Flatpak, but a quick search took me to https://github.com/RadeonOpenCompute/ROCm/issues/1685#issuecomment-1058957458 and https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1181 which tells us that it isn't really ready to be used in that way at this time.

bertogg commented 1 year ago

Hi @DuckersMcQuack I'm not familiar with ROCm but if you need to add a large package to the system and Flatpak is not an option then maybe you can use systemd-sysext to add ROCm as a system extension. This is already available on the Steam Deck and does not require touching the root filesystem.

DuckersMcQuack commented 1 year ago

Replying to https://github.com/ValveSoftware/SteamOS/issues/971#issuecomment-1403771756

Gotcha. Cause i currently have a 1TB 2230 pm991 nvme, hence i want to install it as i wanna tinker with the deck and do blender/stable diffusion renders/generations as my main rig is out of commission at the moment, and deck is all i have being a "competent computer".

And sadly until valve adds a dual boot system to install windows on say a 120GB partition, adding rocm separately is the only way for now.

I had a few users in pcmr discord try helping me to install blender rocm and rocm separately/manually, but sadly to no avail as apparently root "/" was on a separate partition and only had 905MB free, so can't install it manually either.

But would be appreciated if a request for a "large system storage steamOS image" with rocm included could be considered that has rocm embedded for us tinkerer's/power users who wants to experiment and push deck do it's limits or simply being able to do workloads that isn't just games. As after all, deck had the hardware for such workloads as it's rdna2. And indeed memory is slow compared to gddr5/6, but better than using cou which takes way more time.

5310 commented 9 months ago

Now that SteamOS 3.5+ has Podman and Distrobox pre-installed, and Arch has the core ROCm kernel module in the repository, we could just install the kernel module on the base system which is very small?

Then the users could choose to create containers with the rest of huge ROCm runtimes and libraries if they want; handling the storage concerns themselves without having to worry about updates uninstalling the module or fragile overlays on top of the immutable root.

Recently I've set up a fresh desktop Arch install with a few of AMD's ROCm Docker images, and they run very well, requiring only the core kernel module on the host. I have yet to try it with Podman, but it seems to be feasible without rootful containers.

As such, I would humbly request that we reconsider pre-installing just the core module into SteamOS for the Deck.

DuckersMcQuack commented 5 months ago

Now that SteamOS 3.5+ has Podman and Distrobox pre-installed, and Arch has the core ROCm kernel module in the repository, we could just install the kernel module on the base system which is very small?

Then the users could choose to create containers with the rest of huge ROCm runtimes and libraries if they want; handling the storage concerns themselves without having to worry about updates uninstalling the module or fragile overlays on top of the immutable root.

Recently I've set up a fresh desktop Arch install with a few of AMD's ROCm Docker images, and they run very well, requiring only the core kernel module on the host. I have yet to try it with Podman, but it seems to be feasible without rootful containers.

As such, I would humbly request that we reconsider pre-installing just the core module into SteamOS for the Deck.

Aye, i've gotten all to work in distrobox today but downside is that as (iirc) due to being in distrobox, it can't do like games and allocate more ram as video memory if needed like games can, so i've set UMA frame buffer to 4GB, but stable diffusion needs more than 4GB, but unable to go past that 😭

5310 commented 5 months ago

Even Blender Cycles ends up needing more than 4GB VRAM for me...

But maybe, just maybe, the AMDKFD update in kernel v6.1 enabled would someday come to the Steam Deck as well? :pleading_face:

Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device.

The solution is MI300A approach, i.e., let VRAM allocations go to GTT.

DuckersMcQuack commented 3 weeks ago

Even Blender Cycles ends up needing more than 4GB VRAM for me...

But maybe, just maybe, the AMDKFD update in kernel v6.1 enabled would someday come to the Steam Deck as well? 🥺

Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT.

SteamOS thankfully allocates ram to as much vram as it needs. Sadly windows can't do the same. It is limited to 4GB.

But blender using more than 4GB is just a memory limitation. As ROCM is just there to translate cuda workloads to gpu accelerate the tasks you'd otherwise use cpu for which is miles slower.