Crashes on startup with DR 18

gordonjcp commented 2 years ago

The splash screen appears and disappears almost instantly. Bisected issue to commit bba7db4 - "Force rendering on dedicated GPU", d881adb does not suffer from this.

Intel onboard graphics (not in use), GT1030 card in slot.

fat-tire commented 2 years ago

Interesting! Thanks for this report. Not sure what to do exactly as this was specifically supposed to fix hybrid situations... See Issue #19 -- when you say intel onboard graphics "not used" do you mean you set BIOS to discrete mode (as opposed to integrated or optimus)? It would be great if the run script be auto-detect, but not sure what the test would be...

Is it any one of these lines being set, or is it the combination of all three that kill the app?

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia __VK_LAYER_NV_optimus=NVIDIA_only

Thanks for reporting this. And any thoughts on what to do to still solve #19 but not break anything for you are appreciated. Also, @Hanziness do you have any thoughts?

Hanziness commented 2 years ago

Interesting, indeed. I've not used a hybrid GPU setup this way (dedicated GPU only) but supposed that it behaves the same way as a discrete GPU-only one. I'd also suggest starting with removing one or more of the command line arguments (starting with __NV_PRIME_RENDER_OFFLOAD=1 to see if it helps. @gordonjcp can you check these? Also, can you post the logs here from mounts/logs/rollinglog.txt? (the mounts folder is located in the folder of the Dockerfile)

@fat-tire I'm not sure if these situations can be properly (or easily) auto-detected as now we've encountered a third GPU setup we've not thought of. So maybe an easier option could be to let the user enable offloading (eg. by using an environment variable) if they are using a hybrid GPU setup?

fat-tire commented 2 years ago

I think that may be the best solution-- maybe default to not using the line, but if you include something like RESOLVE_GPU_CONFIG Maybe instead of just "Y"/"N" type setting we should allow for multiple scenarios just in case another situation crops up? Maybe options like DEFAULT, DISCRETE, HYBRID, INTEGRATED, INTEL, OPTIMUS, AMD, AUTODETECT, etc.? Some of these examples won't currently work or would be equivalent to each other, but maybe building in some flexibility here might be forward-compatible, especially if #8 gets addressed... ideally "DEFAULT" would auto-detect the card/configuration...

I'd also be interested from @gordonjcp if eliminating any of those variables fixed the problem (and if so, do you definitely need all 3, @Hanziness?)

Out of curiosity, did either of you try the new x264 plugin? I think I like it more than the built-in H264 one, but curious to compare on other machines. To try, set RESOLVE_BUILD_X264_ENCODER_PLUGIN=Y when running ./build.sh and look for it in Quicktime or MP4 output in the Render page.

ft

Hanziness commented 2 years ago

Ah, sorry for the late response! I'm not sure whether we need all three, but as far as I know, the prime-run command that is used to launch applications on the dedicated GPU is actually (as far as I know) just a shell script that sets these variables. I'll check its contents when I get back to my laptop (as prime-run does not exist on single GPU systems or is provided by a package for hybrid GPU setups) :)

It would be great to see either the logs (preferably) from @gordonjcp or whether a subset of the environment variables fixes the issue, to move forward.

Also, thanks for the heads up on the x264 plugin, I'll check it out! Is it for the Studio version only?

fat-tire commented 2 years ago

I think it should work fine w/ either version (though I've only tested it on Studio). I prefer it to H264, fwiw.

I'd love to fix up the RUN command, so yeah, let me know which configuration seems to be the best. I wonder if there's a way to auto-detect and then set accordingly.. :man_shrugging:

Hanziness commented 2 years ago

So, this is the prime-run script:

$ cat /bin/prime-run
#!/bin/bash
__NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only __GLX_VENDOR_LIBRARY_NAME=nvidia "$@"

I think a good way to start the work on detecting the GPU setup would be to check if prime-run exists on the host. If it does, then chances are we are running on a hybrid GPU setup as it's often auto-installed by distributions. On Manjaro (Arch) this is provided by the nvidia-prime package as Manjaro pre-installed it on my laptop, but not my desktop.

Or maybe another way could be using glxinfo: it outputs a line starting with OpenGL vendor string:, and it lists the GPU manufacturer like so (this can be done inside the container, too):

$ glxinfo | grep "OpenGL vendor"
OpenGL vendor string: Intel

$ prime-run glxinfo | grep "OpenGL vendor"
OpenGL vendor string: NVIDIA Corporation

Additionally, we can also grep for the manufacturers themselves as I get no "Intel" lines when I run prime-run glxinfo | grep "Intel" (and likewise no "NVIDIA" lines when I start it on the Intel graphics card). I'm just not sure how it would help us detect the GPU setup. Maybe run glxinfo with both the environment variables and without them and pick whichever configuration prints NVIDIA first? (i.e. if it prints NVIDIA without the variables, then go with that, otherwise enable the environment variables). But what about AMD setups? As far as I know Resolve is not supported on AMD graphics cards on Linux, but someone with an AMD configuration should confirm this.

My idea would be to use a separate shell script as the RUN command instead of directly starting Resolve. That way we could perform these additional checks on launch (e.g. using glxinfo like mentioned above) instead of at build-time (which would have us rebuild the container if anything changes). It could also allow the user to pass arguments like --force-prime (which would set the environment variables regardless of what we detect).

gordonjcp commented 2 years ago

Annoyingly I don't appear to be able to duplicate this now. The PC I was using before broke, and I'm now using a slightly different one. Once I get another machine identical to the first I'll dig into it more :-/

fat-tire commented 1 year ago

I'm going to close this for now. Feel free to re-open @gordonjcp if you're able to duplicate it again... thx!

fat-tire / resolve

Crashes on startup with DR 18 #24