games-on-whales / wolf

Stream virtual desktops and games running in Docker
https://games-on-whales.github.io/wolf/stable/
MIT License
292 stars 20 forks source link

Stuck on deployment #61

Closed Ramaddan closed 2 weeks ago

Ramaddan commented 5 months ago

Hi,

I was trying to deploy Wolf, and all seems to be fine, until I reach the docker cli or compose instructions.

I get the following error and cannot continue:

docker: Error response from daemon: error gathering device information while adding custom device "/dev/nvidia-caps/nvidia-cap1": no such file or directory. ERRO[0000] error waiting for container: context canceled

I have an RTX nVidia Quadro GPU with driver version: 545.29.06

And the devices nvidia-caps do not exist on my system.

Running on Ubuntu 22.04 Kernel: Linux 5.15.0-92-lowlatency x86_64

Thanks

Ramaddan commented 4 months ago

I wonder if it has anything to do with this:

https://github.com/NVIDIA/nvidia-docker?tab=readme-ov-file

DEPRECATION NOTICE

This project has been superseded by the NVIDIA Container Toolkit.

Update (Feb 5, 2024): Tried to install the above but it did not fix my specific problem. Still stuck at the nvidia-caps

What is that device anyway? Is it for capture?

ABeltramo commented 4 months ago

Since you now have the nvidia driver toolkit; could you try running

sudo nvidia-container-cli --load-kmods info

and see if the devices "magically appear"?

You can also try running Wolf without those extra devices but I fear that it'll probably fail. Btw, which distro are you using?

Also, beware of https://github.com/games-on-whales/wolf/issues/60 with that driver version. I'm sorry for all the troubles, but Nvidia is just plainly hostile on Linux..

Ramaddan commented 4 months ago

Hi. Thanks for the help. No problem, used to nVidia.

Here is what I get

NVRM version: 545.29.06 CUDA version: 12.3

Device Index: 0 Device Minor: 0 Model: Quadro RTX 5000 Brand: QuadroRTX GPU UUID: GPU-(some number) Bus Location: 00000000:01:00.0 Architecture: 7.5

Ramaddan commented 4 months ago

Not sure if that was it, but I have the devices now :-)

Both show now: nvidia-cap1 nvidia-cap2

But I think they really showed after a restart.

There was also something about nvidia container toolkit version prior to 1.12 having issues: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-yum-or-dnf

Thanks

Will try to continue now

Ramaddan commented 4 months ago

Getting stuck here now:

22:03:16.577852995 INFO | Gstreamer version: 1.22.0-0 22:03:16.578928224 INFO | Reading config file from: /wolf/cfg/config.toml 22:03:16.801477822 INFO | Selected H264 encoder: nvcodec 22:03:16.801576288 INFO | Selected HEVC encoder: nvcodec 22:03:16.803287280 INFO | RTSP server started on port: 48010 22:03:16.803337562 INFO | Control server started on port: 47999 22:03:16.803286092 INFO | HTTP server listening on port: 47989 22:03:16.803451998 WARN | [PULSE] Unable to connect, Access denied 22:03:16.803492654 INFO | Starting PulseAudio docker container 22:03:16.804202617 WARN | [DOCKER] Container WolfPulseAudio already present, removing first 22:03:16.805050646 INFO | HTTPS server listening on port: 47984 22:03:18.250209710 WARN | [PULSE] Unable to connect, Access denied

This was my docket CLI command:

docker run --name wolf --network=host -e XDG_RUNTIME_DIR=/tmp/sockets -v /tmp/sockets:/tmp/sockets:rw -e NVIDIA_DRIVER_VOLUME_NAME=nvidia-driver-vol -v nvidia-driver-vol:/usr/nvidia:rw -e HOST_APPS_STATE_FOLDER=/etc/wolf -v /etc/wolf/wolf:/wolf/cfg -v /var/run/docker.sock:/var/run/docker.sock:rw --device-cgroup-rule "c 13:* rmw" --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/dri/ --device /dev/nvidia-caps/nvidia-cap1 --device /dev/nvidia-caps/nvidia-cap2 --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-modeset --device /dev/uinput -v /dev/shm:/dev/shm:rw -v /dev/input:/dev/input:rw -v /run/udev:/run/udev:rw ghcr.io/games-on-whales/wolf:stable

I did not change anything from what was given on the Wolf website

ABeltramo commented 4 months ago

It's not stuck, just waiting for a Moonlight client to connect!

As for the Pulse warning, it's fixed in the upcoming release; as a quick workaround, you can manually clean the /tmp/sockets folder before launching Wolf.

Ramaddan commented 4 months ago

Ok, Still stuck.

Moonlight freezes

First, I could not run as daemon, so that I can see the pin link to use. Is there a way to see it without running it in terminal?

Second, moonlight worked and gave me options of running some apps, then it froze with retroarch

Here are the terminal messages:

[2024-02-07 21:10:32] [ /etc/cont-init.d/10-setup_user.sh: executing... ] [2024-02-07 21:10:32] Configure default user [2024-02-07 21:10:32] Container running as root. Nothing to do. [2024-02-07 21:10:32] DONE [2024-02-07 21:10:32] [2024-02-07 21:10:32] [ /etc/cont-init.d/15-setup_devices.sh: executing... ] [2024-02-07 21:10:32] Configure devices [2024-02-07 21:10:32] Exec device groups [2024-02-07 21:10:33] Adding user 'root' to groups: gow-gid-107,root [2024-02-07 21:10:33] DONE [2024-02-07 21:10:33] [2024-02-07 21:10:33] [ /etc/cont-init.d/30-nvidia.sh: executing... ] [2024-02-07 21:10:33] Nvidia driver detected [2024-02-07 21:10:33] [nvidia] Add Vulkan ICD [2024-02-07 21:10:33] [nvidia] Add EGL external platform [2024-02-07 21:10:33] [nvidia] Add egl-vendor [2024-02-07 21:10:33] [nvidia] Add gbm backend [2024-02-07 21:10:33] [2024-02-07 21:10:33] [2024-02-07 21:10:33] [ /etc/cont-init.d/init-gamescope.sh: executing... ] [2024-02-07 21:10:33] Setting up Gamescope [2024-02-07 21:10:33] Launching the container's startup script as user 'root' 0:00:00.030193650 221 0x55b836e3d4e0 WARN vadisplay gstvadisplay.c:316:gst_va_display_initialize: vaInitialize: unknown libva error libva info: VA-API version 1.17.0 libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so libva info: Found init function __vaDriverInit_1_17 libva info: va_openDriver() returns 0 0:00:00.321987870 221 0x55b836e3d4e0 WARN default gstvaapi.c:231:plugin_init: Cannot create a VA display 0:00:00.329842668 221 0x55b836e3d4e0 WARN default ges-meta-container.c:236:_set_value:GESAsset@0x55b837d04470 Could not set value on item: format-version 0:00:00.329868543 221 0x55b836e3d4e0 WARN default ges-meta-container.c:236:_set_value:GESAsset@0x55b837d04d30 Could not set value on item: format-version 0:00:00.329889973 221 0x55b836e3d4e0 WARN default ges-meta-container.c:236:_set_value:GESAsset@0x55b837d05540 Could not set value on item: format-version 0:00:00.330187729 221 0x55b836e3d4e0 WARN structure gststructure.c:2334:priv_gst_structure_parse_fields: Failed to find delimiter, r=mimetype 0:00:00.339279852 221 0x55b836e3d4e0 WARN vadisplay gstvadisplay.c:316:gst_va_display_initialize: vaInitialize: unknown libva error 0:00:00.364229073 221 0x55b836e3d4e0 WARN GST_PLUGIN_LOADING gstplugin.c:534:gst_plugin_register_func: plugin "/usr/local/lib/x86_64-linux-gnu/gstreamer-1.0/validate/libgstvalidatessim.so" failed to initialise 0:00:00.366679137 221 0x55b836e3d4e0 WARN adaptivedemux2 gstadaptivedemuxelement.c:41:adaptivedemux2_base_element_init: Failed to load libsoup library 0:00:00.366825376 221 0x55b836e3d4e0 WARN adaptivedemux2 gstadaptivedemuxelement.c:41:adaptivedemux2_base_element_init: Failed to load libsoup library 0:00:00.366961813 221 0x55b836e3d4e0 WARN adaptivedemux2 gstadaptivedemuxelement.c:41:adaptivedemux2_base_element_init: Failed to load libsoup library 0:00:00.395852104 221 0x55b836e3d4e0 WARN GST_PLUGIN_LOADING gstplugin.c:534:gst_plugin_register_func: plugin "/usr/local/lib/x86_64-linux-gnu/gstreamer-1.0/validate/libgstvalidatessim.so" failed to initialise 21:10:33.750481303 INFO | Gstreamer version: 1.22.0-0 21:10:33.751558448 INFO | Reading config file from: /wolf/cfg/config.toml 21:10:33.995697704 INFO | Selected H264 encoder: nvcodec 21:10:33.995813322 INFO | Selected HEVC encoder: nvcodec 21:10:33.997481417 INFO | RTSP server started on port: 48010 21:10:33.997484979 INFO | HTTP server listening on port: 47989 21:10:33.997507926 INFO | Control server started on port: 47999 21:10:33.997713651 WARN | [PULSE] Unable to connect, Access denied 21:10:33.997752878 INFO | Starting PulseAudio docker container 21:10:33.999269304 INFO | HTTPS server listening on port: 47984 21:11:03.007751489 INFO | RTP server started on port: 48100 21:11:03.007884712 INFO | RTP server started on port: 48200 0:00:29.777981244 1 0x7ff954001060 WARN cudaconvertscale gstcudaconvertscale.c:1265:gst_cuda_base_convert_set_info: Can't calculate borders 21:11:03.317384145 INFO | Starting container: /WolfRetroarch_17305313217028394902 21:11:03.932710183 WARN | [INPUT] Unable to find controller 4 21:11:03.932810773 WARN | [INPUT] Unable to find controller 5 21:11:03.932950399 WARN | [INPUT] Unable to find controller 6 21:11:03.933056161 WARN | [INPUT] Unable to find controller 7 0:00:31.650606677 1 0x7ff9540015e0 WARN audioencoder gstaudioencoder.c:1014:gst_audio_encoder_finish_frame: Can't copy metadata because input buffer disappeared 0:01:13.030246307 1 0x7ff924002680 WARN audiosrc gstaudiosrc.c:227:audioringbuffer_thread_func: error reading data -1 (reason: Success), skipping segment 21:11:56.530056785 INFO | Stopped container: /WolfRetroarch_17305313217028394902 21:13:19.472040181 INFO | RTP server started on port: 48100 21:13:19.472177838 INFO | RTP server started on port: 48200 0:02:46.215739598 1 0x7ff9540015e0 WARN cudaconvertscale gstcudaconvertscale.c:1265:gst_cuda_base_convert_set_info: Can't calculate borders 21:13:19.737998689 INFO | Starting container: /WolfRetroarch_17305313217028394902 21:13:20.399408939 WARN | [INPUT] Unable to find controller 4 21:13:20.399510081 WARN | [INPUT] Unable to find controller 5 21:13:20.399664170 WARN | [INPUT] Unable to find controller 6 21:13:20.399784263 WARN | [INPUT] Unable to find controller 7 0:02:48.106770298 1 0x7ff954000da0 WARN audioencoder gstaudioencoder.c:1014:gst_audio_encoder_finish_frame: Can't copy metadata because input buffer disappeared 0:02:56.929274749 1 0x7ff934003d70 WARN audiosrc gstaudiosrc.c:227:audioringbuffer_thread_func: error reading data -1 (reason: Success), skipping segment

Ramaddan commented 4 months ago

1) I tried to stream to an android phone instead.

2) It somewhat works, at least firefox, but cannot seem to bring up any keyboard layout.

And retroarch opens, but the graphics are too bad to be able to use

So it seems the problem is still there

3) The other thing I noticed is that it controls the mouse cursor on the host machine, is that normal?

Shouldn't it be isolated from being able to do anything but stream from the host within the container, multiple instances?

4) I also noticed I have to keep executing the command after every reboot or shutdown

sudo nvidia-container-cli --load-kmods info

so that it finds the /dev/nvidia-cap1 and /dev/nvidia-cap2 devices again

Is there a way to make this more permanent?

Thanks