ValveSoftware / steam-runtime

A runtime environment for Steam applications
Other
1.2k stars 86 forks source link

Blender with AMD GPU and HIP #710

Open playday3008 opened 2 days ago

playday3008 commented 2 days ago

Your system information

Please describe your issue in as much detail as possible:

When running Blender through Steam, it uses SteamLinuxRuntime 1.0 (scout), which causes Blender to not be able to load libamdhip64.so (dependency of Cycles Rendering Engine when using AMD GPU as render device (a.k.a. HIP). Running it directly or with -compat-force-slr off though, works fine. The problem is, disabling SteamLinuxRuntime globally is not a good idea I guess.

Examples
Running through Steam ![image](https://github.com/user-attachments/assets/68ff73b7-6b8f-444c-80f7-20529a3b08f9)
Running through Steam with -compat-force-slr off ![image](https://github.com/user-attachments/assets/181f7a3b-0dd0-4728-940a-9225df6362b4)
Running directly ![image](https://github.com/user-attachments/assets/97ee6632-bf63-422d-a717-07e9ba0eedcc)

Steps for reproducing this issue:

  1. Start Steam
  2. Open Blender
  3. Go to Edit -> Preferences... -> System -> Cycles Render Devices -> HIP
  4. Says: No compatible GPUs found for Cycles...

Steps to avoid this issue (kinda hack I guess):

smcv commented 2 days ago

It is expected that apps/games running in a Steam Linux Runtime container cannot load arbitrary libraries from the host system, other than accelerated graphics drivers (OpenGL, Vulkan, VA-API, ...) which have a series of special cases to make them work.

Using the GPU for general-purpose computation (HIP, CUDA, OpenCL, ...) is not exactly graphics, but it's working in approximately the same space, so perhaps ideally the container infrastructure would pick up this library from the host too. I think we currently have similar special cases for some GPU computation APIs (in particular, I think we do have CUDA), but not all of them.

I see that Blender's code to interact with HIP involves running a hipcc executable, under at least some circumstances. That executable is not going to be available in the container either, unless we take very specific steps to make it available - and that would also need to include any files that it opens at runtime, because nothing from the host system is visible in the container unless we arrange for it to be.

To know whether this is feasible, we'll need some information from AMD, or from you if you happen to know the answers:

playday3008 commented 2 days ago

I do have answers for some questions, but be aware that some answers might be applicable to Fedora only:

  1. No idea, I don't have much/long time experience with HIP
  2. On my machine, it's located in /usr/lib64/libamdhip64.so, because that's where rocm-hip-devel package is installing it.
    • Package manager on a supported distro (assumption, eg. for Debian there's a libamdhip64-dev package) will put lib into /usr/lib64 (or respective alternative (assumption, eg. for Debian lib goes into /usr/lib/x86_64-linux-gnu/libamdhip64.so)), where ldconfig for example can find it (true for Fedora, assumption for the rest of distros)
    • Manual installation (a.k.a. installation from upstream) will put lib into /opt/rocm/lib
  3. Dependency tree for rocm-hip-devel on Fedora (including hipcc) ![tree](https://github.com/user-attachments/assets/c379f8c1-11c8-475a-9a26-b4f4d689ecf9) As we can see, it's impossible to have `libamdhip64.so` without `hipcc` installed
  4. It does not have i686 version
  5. hipcc is compiler driver (something like clang-cl for msvc), it depends on rocm-device-libs, rocminfo and clang, below is dependency tree

    hipcc dependency tree ![image](https://github.com/user-attachments/assets/64af0de0-2fdf-440d-a81a-65aca436463e)
  6. file on lib says it's shared object and not pie executable, but using strace on blender reveals file access sequence after loading libamdhip64.so (in order of accessing)
    /lib64/libamdhip64.so
    /lib64/libamd_comgr.so.2
    /lib64/libhsa-runtime64.so.1
    /lib64/libnuma.so.1
    /usr/lib64/llvm18/lib/liblldELF.so.18.1
    /usr/lib64/llvm18/lib/liblldCommon.so.18.1
    /usr/lib64/llvm18/lib/libclang-cpp.so.18.1
    /usr/lib64/llvm18/lib/libLLVM.so.18.1
    /lib64/libhsakmt.so.1
    (/sys/..., /proc/..., /dev/...)
    /usr/share/libdrm/amdgpu.ids
    (/sys/..., /proc/..., /dev/...)
    ~/.local/share/Steam/steamapps/common/Blender/lib/libOpenImageDenoise_device_hip.so.2.3.0

Hope those answers will help somehow, but nice feature would be option for disabling SteamLinuxRuntime per app, something like

that ![image](https://github.com/user-attachments/assets/104767a8-26ae-4d06-b8d4-ee62c22b4f00)
playday3008 commented 2 days ago

Also, seems like related issue from Blender repo: https://projects.blender.org/blender/blender/issues/129895

smcv commented 2 days ago

If libamdhip64.so.5 is following normal Linux library naming conventions, then libamdhip64.so.5 should be long-term ABI-stable, but libamdhip64.so most likely isn't: if AMD released an incompatible version with a different ABI, it would probably be named like libamdhip64.so.6, and libamdhip64.so would probably change to point to that.

smcv commented 2 days ago

5. hipcc is compiler driver (something like clang-cl for msvc), it depends on rocm-device-libs, rocminfo and clang, below is dependency tree

Sorry, that's probably too "big" to be reasonable to pull into the container as part of the graphics stack: we can pick up a few essential libraries like the ones needed by Mesa, but every new library we add involves another roll of the dice on whether it will set up some incompatibility that causes a crash, and sooner or later our luck will run out. So I think whatever happens here will have to involve hipcc running outside the container in some way.

nice feature would be option for disabling SteamLinuxRuntime per app

I can see the appeal of that, but that's outside the Steam Runtime team's control, and would have to come from a Steam client developer.

playday3008 commented 2 days ago

If libamdhip64.so.5 is following normal Linux library naming conventions, then libamdhip64.so.5 should be long-term ABI-stable, but libamdhip64.so most likely isn't: if AMD released an incompatible version with a different ABI, it would probably be named like libamdhip64.so.6, and libamdhip64.so would probably change to point to that.

On my system, it looks like that: /lib64/libamdhip64.so -> /lib64/libamdhip64.so.6 -> /lib64/libamdhip64.so.6.2.41134 (arrows means symlink), and ldconfig have both, ...so.6 and ...so

playday3008 commented 2 days ago

nice feature would be option for disabling SteamLinuxRuntime per app

I can see the appeal of that, but that's outside the Steam Runtime team's control, and would have to come from a Steam client developer.

Honestly, that's the only optimal solution I see, the suboptimal but still good one would be, putting some voodoo magic into launch options, so it would escape/go through/disable that container and run natively, something like STEAM_LINUX_RUNTIME=0 %command%, etc.

Or maybe there's already some environment variable or something else that does that, and I'm not aware of it (STEAM_RUNTIME=0 steam does not count, but will it work though?)

smcv commented 2 days ago

In the short term, the best/least-bad workaround is steam -compat-force-slr off (which returns to the pre-November behaviour), together with optionally marking native Linux apps/games that run correctly in SLR (i.e. not Blender, in your case) as Properties → Compatibility → Force the use of... → Steam Linux Runtime 1.0.

STEAM_RUNTIME=0 steam does not count, but will it work though?

No, that won't work. Disabling the use of the older LD_LIBRARY_PATH-based runtime to run the Steam client itself (which is what STEAM_RUNTIME=0 does) is specifically not supported or supportable, and it doesn't have any effect on whether things get run in the Steam Linux Runtime container or not.

For native Linux apps/games that target the scout ABI (which means most native Linux apps/games right now, including Blender), steam -compat-force-slr off is currently the only way to disable the use of the SLR container.

For completeness: For native Linux apps/games that target the newer sniper ABI, the SLR container is required and cannot be disabled. For Windows apps/games running under Proton 5.13 or newer, the SLR container is required and cannot be disabled (except by choosing to use a very old old version of Proton instead). For Windows apps/games running under Proton 5.0 or older, the SLR container is not used and cannot be enabled (except by choosing to use a newer version of Proton instead).

putting some voodoo magic into launch options, so it would escape/go through/disable that container and run natively

This is not something that is supported or supportable. (A developer-oriented mechanism does exist, but should not be recommended to end users as a workaround.)

playday3008 commented 2 days ago

In the short term, the best/least-bad workaround is steam -compat-force-slr off (which returns to the pre-November behaviour), together with optionally marking native Linux apps/games that run correctly in SLR (i.e. not Blender, in your case) as Properties → Compatibility → Force the use of... → Steam Linux Runtime 1.0.

So instead of explicitly disabling SLR per app, we can disable SLR completely and then explicitly enable it per app (if needed, but I guess better enable for all apps and then disable for those one that doesn't work well with it). Why I didn't think of that, that's an option, the junky one, but definitely an option, big thanks for deep dive explanations and for single (I hope for now) solution.

smcv commented 1 day ago

I guess better enable for all apps and then disable for those one that doesn't work well with it

If there was a supported/supportable way to achieve this, then yes I think that would be better, but currently there is not.

A future Steam Client release might add a way to do this (but this is not under my control, so no guarantees). If it does, I'll try to update this issue to mention it.

DarkDefender commented 1 day ago

I just wanted to chime in here as well. I'm the current Linux maintainer for Blender.

As you guys already figured out it is not really viable for Steam to ship all the needed HIP libraries. Currently HIP is changing quite a bit between versions and the same is true for Intel's "oneAPI" GPU backend as well.

We (the Blender developers) do not really see a good way for Blender to work inside of the new steam sandbox. Besides just general hardware interface issues, it also leads to us having a harder time to provide access to the host machines mounted volumes etc: https://projects.blender.org/blender/blender/issues/129902

We hope that we can reach out and discuss the alternatives with Valve directly and come up with a solution.