Xpra-org / xpra

Persistent remote applications for X11; screen sharing for X11, MacOS and MSWindows.

https://xpra.org/

GNU General Public License v2.0

1.96k stars 169 forks source link

Advice on solid 60fps streaming #3518

Open SuperDisk opened 2 years ago

SuperDisk commented 2 years ago

I originally emailed this to @totaam but he pointed me here.

I've been working on a virtualization platform for VFX software and I've been using Xpra as the front end. Ultimately the goal is to be able to maintain a solid 60fps video stream with no stutter and minimal latency. I'm currently getting the best performance using the NVEnc video encoder, but over a connection with about 30ms ping or so, the performance comes and goes-- at its best, it's decently smooth, but will often begin to stutter for seconds at a time before recovering.

First: are there any settings I can tweak which can help accomplish the goal of constant 60fps?

Second: I'm interested in doing some hacking on Xpra to improve latency/performance, but I wanted to reach out and find where the true bottlenecks are.

Some questions

It seems to me like having the packet encoding/decoding/serialization logic be in pure Python must incur a pretty big performance hit, right? Maybe I'm wrong (I haven't profiled yet) but would converting this to something like Cython be a worthwhile effort?
Would something like the QUIC protocol mitigate stuttering/recovering video? I see there's an open issue to implement QUIC, but I'm wondering if you think this would actually improve latency.
Is there any advantage to having image encoders that get switched to and from dynamically? It seems to me like the best course would be to just constantly send h264 and forget about JPEG/PNG.
On the latency graph while I stream, it seems like "batch delay" takes up a sizable chunk of the budget. I'm not quite sure what this is, could anyone point me in the right direction?
I've noticed lots of discussion around "damage areas." Again, I'm not 100% sure what this is, but I assume it's something like dirty rectangles in that only the changed part gets sent in order to decrease bandwidth. Again I have to ask, why not just skip all this and exclusively let h264 take care of it, just send a constant video stream?

Generally speaking I'm just wondering what the bottlenecks are that are causing jitter/lag currently so I can focus effort on them. Parsec, for example, works mindblowingly well in terms of latency, it really feels like you are physically in front of the remote computer. They use a custom proprietary protocol on top of UDP just for that though, but it would be awesome if Xpra could reach the same level (or at least close to it).

totaam commented 2 years ago

Ultimately the goal is to be able to maintain a solid 60fps video stream with no stutter and minimal latency.

60fps is achievable with NVENC, but not at 4K, only at 1080p.

but over a connection with about 30ms ping or so

How much bandwidth do you have? 100Mbps, 1Gbps, more?

First: are there any settings I can tweak which can help accomplish the goal of constant 60fps?

--speed=100 and lower min-quality. Make sure NVENC is enabled. Ensure that you application is detected as video: https://github.com/Xpra-org/xpra/tree/master/fs/etc/xpra/content-type

It seems to me like having the packet encoding/decoding/serialization logic be in pure Python must incur a pretty big performance hit, right?

It is not in pure python at all: packet encoding and decoding is in C (Cython) via rencode and python-lz4. The part that makes the syscall to read from the socket is in python, but this just fowards to the glibc API and so this is insignificant. (compared to the cost of IO itself)

Would something like the QUIC protocol mitigate stuttering/recovering video?

Yes: #3376

Is there any advantage to having image encoders that get switched to and from dynamically? It seems to me like the best course would be to just constantly send h264 and forget about JPEG/PNG.

This naive approach has been tried initially and it was an unmitigated disaster. And don't forget about webp, transparency, plain rgb, etc..

On the latency graph while I stream, it seems like "batch delay" takes up a sizable chunk of the budget. I'm not quite sure what this is, could anyone point me in the right direction?

The batch delay is how long the server normally waits between each screen update it sends (when it needs to). If your client's display is 50Hz, we try to use a 20ms budget for frames. (but this will also vary to accommodate bandwidth issues)

but I assume it's something like dirty rectangles in that only the changed part gets sent in order to decrease bandwidth

Correct.

Again I have to ask, why not just skip all this and exclusively let h264 take care of it, just send a constant video stream?

In some cases it does work, which is why we have video region detection. In others, it is just abysmal. (ie: type one character in a large terminal window)

SuperDisk commented 2 years ago

How much bandwidth do you have? 100Mbps, 1Gbps, more?

I'm running on an AWS machine with 1Gbps+ so I don't think bandwidth is an issue.

It is not in pure python at all

The encode function in net is Python, although the latency graph does show that network operations don't seem to be very important in the grand scheme.

QUIC

Well, I just wonder how effective this would be at reducing jitter. I also looked at SRT, which is another UDP-based protocol meant for video streaming. It's unreliable (like UDP), but perhaps we could possibly split the packets into separate streams of video/input updates so that input is reliable over TCP and the video comes via SRT or something.

This naive approach has been tried initially and it was an unmitigated disaster.

How long ago? Just as a test, I used ffmpeg on my AWS instance, encoded a 1080p, 60fps video with it using nvenc, and piped it over to my local computer over TCP, and I was able to get a rock solid 60fps framerate. What caused slowdown for the "naive approach" in Xpra? I imagine the only difference between Xpra and the ffmpeg experiment is the acquisition of the window frame versus the source being a video file.

totaam commented 2 years ago

I'm running on an AWS machine with 1Gbps+ so I don't think bandwidth is an issue.

Sure, but I don't think your client also runs there, does it? So unless you have a solid Gbps from AWS to your client, the available bandwidth could be lower than that.

The encode function in net is Python, although the latency graph does show that network operations don't seem to be very important in the grand scheme.

Yes, that one really doesn't matter. It just dispatches to rencode. (there's some old profiling data somewhere)

I also looked at SRT, .. we could possibly split the packets into separate streams of video/input updates so that input is reliable over TCP and the video comes via SRT ..

Yes, that sounds good but it dramatically increases the logic needed for managing those network flows. QUIC just gives it to you for free with a higher level API.

How long ago?

You can still test this by using strict encoding mode and forcing h264 for a specific window. One thing we could definitely do better here is NVENC context allocation: at the moment, it will start with a jpeg or software x264 context for the first few frames and switch to NVENC afterwards, this could explain the jitter you're seeing - there's a PR somewhere that does this better.

I imagine the only difference between Xpra and the ffmpeg experiment is the acquisition of the window frame versus the source being a video file.

The acquisition is one thing, but it's not a huge cost at 1080p, not enough to drop below 60fps. (run your server with -d damage,encoding) The problem is that you're comparing a video file with windows that come and go (think xterms, popup windows and what not). They're only comparable for some very specific use-cases, xpra catters to a much wider variety of applications. (and perhaps we can tune it better for this particular case)

SuperDisk commented 2 years ago

Sure, but I don't think your client also runs there, does it?

I have a fiber connection and get about 170Mbps down on Google's speed test, so I don't think bandwidth is an issue.

QUIC just gives it to you for free with a higher level API.

My only concern is that it seems to be an event-based API rather than a stream-based one, which will be sort of annoying to integrate. (you mention this in the issue as well)

The problem is that you're comparing a video file with windows that come and go (think xterms, popup windows and what not).

True, although I ran this test:

Xpra

On the server side, started Xpra with xpra start :10 --video-encoders=nvenc --start=xterm --bind-tcp=0.0.0.0:xxxx
On the client side, connected with xpra_cmd attach tcp://x.x.x.x:xxx/10
On the server side, ran DISPLAY=:10 ffplay bbb_sunflower_1080p_60fps_normal.mp4

The FFPlay window appears on my client, and works "okay," but is pushing about 30fps if I had to guess. There are little stutters and in general it's workable, but not ideal. I also tried running xpra start with --speed=100 and --encoder=h264 but it didn't really seem to change anything. I also occasionally get the "your network can't keep up" dialog box, but I think it must be erroneous.

FFmpeg directly

On the server side, ran ffmpeg -re -i bbb_sunflower_1080p_60fps_normal.mp4 -c:v h264_nvenc -c:a copy -f mpegts tcp://x.x.x.x:xxxx
On the client side, ran ffplay tcp://0.0.0.0:xxxx?listen

The FFPlay window appears and the video plays back at 60fps, 100% smoothly-- zero stutter or framerate decrease.

So, both approaches use TCP and NVENC. Something must be going on in Xpra that's introducing stutter/delay. Any idea on what it could be/ways to debug?

totaam commented 2 years ago

I have a fiber connection and get about 170Mbps down on Google's speed test, so I don't think bandwidth is an issue. (..) bbb_sunflower_1080p_60fps_normal.mp4

Raw 1080p60 consumes roughly 3Gbps (120 MPixels/s x 24-bit). Theoretically, you would need to compress down to 17% of its original size to saturate the 170Mbps link using CBR. In practice, we can't do CBR, we don't actually know the amount of bandwidth available and 170Mbps is likely to be an average. My guesstimate is that you need a compression ratio of 90% or more. This is achievable with NVENC - though the low-latency setting that we use by default does consume a lot more bandwidth than the other options.

Based on this, I would not be saying that bandwidth is not an issue at this point.

True, although I ran this test:

This is apples to oranges, but let's continue.

but is pushing about 30fps if I had to guess

You can see exactly how many frames per second you're getting and which encoding is actually being used to update your screen with: XPRA_PAINT_BOX=1 xpra attach ... https://github.com/Xpra-org/xpra/blob/f67fbd80046cc52283c5d2c6bfee372bb7dd3ed9/xpra/client/paint_colors.py#L8-L26

I also occasionally get the "your network can't keep up" dialog box, but I think it must be erroneous.

I don't think it is. Your ffmpeg command lines will have buffering and extra latency, xpra cannot tolerate high latency because users want their sessions to be interactive: keyboard and pointer events need to show up immediately, not half a dozen frames later. You can increase the latency tolerance with XPRA_ACK_TOLERANCE=2000 xpra start .. but this will only get you so far.

So, both approaches use TCP and NVENC

As per above, they're completely different. ffmpeg with NVENC does not do any bandwidth control, which is something that you absolutely must do for xpra to be usable. Also, the ffmpeg option can use multiple b-frames, xpra cannot. etc..

Any idea on what it could be/ways to debug?

See above, and the comment before that. (-d damage,encoding) There are other debug switches that allow you to see when events occur, how well screen updates are compressed (-d compress), bandwidth issues (-d bandwidth), video detection -d regiondetect, etc.. Also, ffplay was not detected as a video application, it is now: 3f341b2eab8484d55c71b0043c3ce78070106e01 (this gives me a modest ~40% framerate increase) - you can easily apply this one-liner by hand - alternatively, you can disable scroll encoding with --encodings=-scroll (see #3519).

But the starting point is always xpra info of your session.

SuperDisk commented 2 years ago

Based on this, I would not be saying that bandwidth is not an issue at this point.

Ok, fair enough.

You can see exactly how many frames per second you're getting and which encoding is actually being used[...]

Using XPRA_PAINT_BOX=1 is super helpful, thanks. Just for simplicity, I ran xpra attach with --encodings=h264, but I get a message in the log saying this:

2022-04-15 03:28:56,964 Error: processing new connection from Protocol(tcp socket: 172.31.22.179:5545 <- 63.134.185.50:50624):
2022-04-15 03:28:56,968  no common encodings found (server: rgb24, rgb32, png, png/L, png/P, jpeg, webp vs client: h264, excluding: )

xpra info for client: https://gist.github.com/SuperDisk/0c3b888a17cad2e8e20679c3f8871940 xpra info for server: https://gist.github.com/SuperDisk/456d927b860fa2189b6b1456daee6135

I also used that configuration file which recognizes FFPlay as video, but it didn't seem to improve things.

SuperDisk commented 2 years ago

When I force NVENC, I get this error message in the logs:

2022-04-14 23:15:56,819 Warning: failed to initialize device:
Exception in thread threaded-clean:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "xpra/codecs/nvenc/encoder.pyx", line 2164, in xpra.codecs.nvenc.encoder.Encoder.threaded_clean
  File "xpra/codecs/nvenc/encoder.pyx", line 2165, in xpra.codecs.nvenc.encoder.Encoder.threaded_clean
  File "xpra/codecs/nvenc/encoder.pyx", line 2171, in xpra.codecs.nvenc.encoder.Encoder.do_clean
  File "/usr/lib/python3/dist-packages/xpra/codecs/cuda_common/cuda_context.py", line 451, in __enter__
    assert self.lock.acquire(False), "failed to acquire cuda device lock"
AssertionError: failed to acquire cuda device lock

2022-04-14 23:15:56,820  failed to acquire cuda device lock
2022-04-14 23:15:56,864 Error: failed to create data packet
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/xpra/server/window/window_source.py", line 2041, in make_data_packet_cb
    packet = self.make_data_packet(damage_time, process_damage_time, image, coding, sequence, options, flush)
  File "/usr/lib/python3/dist-packages/xpra/server/window/window_source.py", line 2550, in make_data_packet
    ret = encoder(coding, image, options)
  File "/usr/lib/python3/dist-packages/xpra/server/window/window_video_source.py", line 2074, in video_encode
    return self.do_video_encode(encoding, image, options)
  File "/usr/lib/python3/dist-packages/xpra/server/window/window_video_source.py", line 2168, in do_video_encode
    return self.video_fallback(image, options, warn=False)
  File "/usr/lib/python3/dist-packages/xpra/server/window/window_video_source.py", line 2070, in video_fallback
    return encode_fn(encoding, image, options)
  File "xpra/codecs/nvjpeg/encoder.pyx", line 684, in xpra.codecs.nvjpeg.encoder.encode
  File "xpra/codecs/nvjpeg/encoder.pyx", line 362, in xpra.codecs.nvjpeg.encoder.Encoder.init_context
  File "/usr/lib/python3/dist-packages/xpra/codecs/cuda_common/cuda_context.py", line 451, in __enter__
    assert self.lock.acquire(False), "failed to acquire cuda device lock"
AssertionError: failed to acquire cuda device lock

When running both Xpra and ffplay however, I checked nvidia-smi and it shows this:

ubuntu@ip-172-31-22-179:~$ nvidia-smi
Thu Apr 14 23:19:26 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   46C    P0    28W /  70W |    369MiB / 15360MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     21287      C   ...45 --video-encoders=nvenc      365MiB |
+-----------------------------------------------------------------------------+

Looks like only Xpra is attempting to use the GPU, so I wonder what's going wrong such that it can't use CUDA. Do you think it's misconfigured drivers or something?

The entire server log is here, with more CUDA related errors: https://gist.github.com/SuperDisk/d9a575245020ebb3304038a9309b7453

SuperDisk commented 2 years ago

While messing around with XPRA_PAINT_BOX, I found that sometimes the video detection will (technically correctly) find only the video region, and send that as h264 while sending the rest as (I think) webp.

bad

This actually results in really choppy/low framerate video in the video region, and the video scrubber also is quite choppy (due to being sent as webm I suppose).

If I grab one of the nodes in the bottom part and jiggle it around, the entire region is detected as video and the framerate magically increases dramatically in both regions.

good

The right and bottom edges flicker a different color which I think might be causing a small additional bit of stutter. I've noticed that when the full region is h264, and I'm moving the node around in a circle just to keep the thing detected as video, the FPS will start at around 30 and slowly "walk" up to about 50ish and then fall back down again once the bottom edge flickers pink. It continues in this cycle--is there some sort of overhead from switching between video/image, does it spin down/spin up the h264 encoder or something? You mentioned something similar about creating an NVENC context in your second comment but I didn't know if that's what you meant.

Amazingly, just running with --encodings=jpeg seems to crank out a solid 30fps even when the window is at maximum res (3440x1440).

I found that when setting --encodings=h264,rgb32, it seems that the entire window gets captured as h264, yielding really good performance (it floats around 45, but stays at a nearly solid 60 when I move the node around).

nuke

Notice that the chosen encoding seems to be flickering between h264 and rgb32. I would completely disable rgb32 if I could and just do only h264, but I can't due to the (I think) bug I mentioned in the previous issue comment.

totaam commented 2 years ago

Just for simplicity, I ran xpra attach with --encodings=h264

Don't do that. As per above, you need more than just h264 to be able to run.

but I get a message in the log saying this: (..) no common encodings found (server: rgb24, rgb32, png, png/L, png/P, jpeg, webp vs client: h264, excluding: )

Your server has no video encodings available, for whatever reason. Please go back to the very beginning and submit the information needed for bug reports: https://github.com/Xpra-org/xpra/wiki/Reporting-Bugs My guess is that you have not installed from an official package and therefore are missing key features, or you are using a custom configuration that disables those features. Or the server is taking a long time to initialize the video encoders (unlikely but possible).

When I force NVENC

What do you mean? There is no way to force NVENC, you can request h264 but that's not exactly the same thing.

I found that sometimes the video detection will (technically correctly) find only the video region, and send that as h264 while sending the rest as (I think) webp This actually results in really choppy/low framerate video in the video region, and the video scrubber also is quite choppy (due to being sent as webm I suppose). If I grab one of the nodes in the bottom part and jiggle it around, the entire region is detected as video and the framerate magically increases dramatically in both regions.

This is a tricky case for the video region detection heuristics, my guess is that the top part updates more than the bottom one. I'm also guessing that this application is not amongst the known apps (like ffplay was), so the server is having to guess and is erring on the side of not giving you too many lossy screen updates.

The right and bottom edges flicker a different color which I think might be causing a small additional bit of stutter.

That's probably just the border of the video area which is an odd number and is therefore sent as something else (usually just rgb since it's tiny). You can use XPRA_VIDEO_SKIP_EDGE=1 xpra start to skip it completely but I doubt that this makes any real difference. (though there are a number of fixes since 4.3.2 that you are missing out on - try the beta channel)

does it spin down/spin up the h264 encoder or something?

Yes. The speed, quality and batch-delay all vary, trying to best adapt to the type of content, amount of activity, network conditions, etc.. The video codec used is based on those attributes, as well as the colourspace subsampling and any picture downscaling it might choose to apply. Then, as I mentioned, NVENC is costly to setup, so the server with start with software h264 for the first few frames.

Amazingly, just running with --encodings=jpeg seems to crank out a solid 30fps even when the window is at maximum res (3440x1440).

You have nvjpeg enabled, which is very fast and doesn't have the same high setup costs as NVENC. (you probably wouldn't be able to reach 20fps with the plain jpeg encoder at this resolution)

I found that when setting --encodings=h264,rgb32, it seems that the entire window gets captured as h264, yielding really good performance (it floats around 45, but stays at a nearly solid 60 when I move the node around).

45 - you can't be idle surely? what is updating the screen? 60 is good, my guess is that NVENC is being used then. If you map this application type to video then you may be able to get 60fps without any other command line options. (it is somewhat surprising that --speed=100 was not enough to get there on its own)

But that's still not 100% ideal, unless the other areas are also updating regularly, what you want is just the middle section as video and the rest as lossless.

Just beware that these settings may be a good fit for this specific application, on this specific hardware, running over this specific connection... but that's not everyone's use case.

Your logs have some serious NVENC errors which is not going to help.

failed to acquire cuda device lock - this shouldn't happen since there is only one encode thread
cuMemcpyHtoD failed: invalid argument - never seen that one, that's when it uploads pixel buffers to the GPU

This can cause choppiness, crashes, etc. The server will spend time trying to use the NVENC encoder before falling back to something else, again and again.

SuperDisk commented 2 years ago

Please go back to the very beginning and submit the information needed for bug reports

Sure, will do tomorrow.

My guess is that you have not installed from an official package and therefore are missing key features, or you are using a custom configuration that disables those features.

I'm using the package that you get from following the steps on the ubuntu installation section. I'll check out the beta version though or maybe build from source.

What do you mean? There is no way to force NVENC

I specified --video-encoders=nvenc, I thought this made NVENC the only available video encoder but maybe I'm incorrect.

As for speed, quality, and batch-delay, to be perfectly honest I haven't noticed much of a change when adjusting these parameters via the context menu on the tray icon. I noticed that the server is reporting an error about failing to decode the quality-change request packet, so I think maybe my requests to modify those parameters just aren't going through. I'll look more into it and create a proper bug report tomorrow as well.

45 - you can't be idle surely? what is updating the screen?

Oh, well there is a 24fps video playing in the panel which is updating the screen. The FPS goes up to 60 when I start dragging the UI in a circle, which is causing more repainting I'm sure.

Your logs have some serious NVENC errors which is not going to help.

Do you think it would be viable to use NVIDIA's Video Processing Framework instead of the homegrown bindings to NVENC? Not sure if that would solve any problems, but their API certainly looks extremely simple.

totaam commented 2 years ago

I specified --video-encoders=nvenc, I thought this made NVENC the only available video encoder but maybe I'm incorrect.

It does, but you should:

not do that: the server will choose nvenc when it is suitable and use something else when it has to (NVENC does not handle transparency and has some specific input size constraints)
specify this in the bug reports so I don't waste my time

Do you think it would be viable to use NVIDIA's Video Processing Framework instead of the homegrown bindings to NVENC?

Perhaps. Xpra's Cython bindings were the first open-source implementation of NVENC, that was over 8 years ago. From past experience, any higher level API (ie: GTK for X11, ffmpeg for video codecs, gstreamer for audio codecs) makes it more difficult to talk directly to the codec / hardware by abstracting things which can hide the gory details sometimes needed for fully utilizing the resources.

totaam commented 2 years ago

The ticket with changes that would help with NVENC is #3286. Ideally we would instantiate a pool of NVENC encoders, ready to serve so that we don't suffer the warm up penalty.

ehfd commented 2 years ago

Hi, @SuperDisk I don't mean to hijack you to a different project, but https://github.com/selkies-project/selkies-gstreamer is meant to do this well. We consider ourselves a sister project to Xpra, as our project admins @danisla and @JanCVanB use Xpra in solutions where we don't use full screen streaming, and we also cooperate in developments.

SuperDisk commented 1 year ago

@totaam I've been trying to build Xpra from source off the fork in #3286 using NVENC but I'm running into issues.

The Ubuntu build instructions mention installing the following packages:

apt-get install libnvidia-encode1 python3-numpy

I'm getting this error:

Package libnvidia-encode1 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'libnvidia-encode1' has no installation candidate

Unfortunately I can't really find much information online about how to get this damn thing. It seems like at one point it was in the Ubuntu repo but now isn't?

Searching for the filename yields this:

user@rec7nucamkq2vopcw:~/xpra$ find / -name 'libnvidia-encode*' -print 2>/dev/null
/opt/nvenc/Lib/linux/stubs/x86_64/libnvidia-encode.so
/opt/nvenc/Lib/linux/stubs/aarch64/libnvidia-encode.so
/opt/nvenc/Lib/linux/stubs/ppc64le/libnvidia-encode.so

But that's just some stub I think, not the real thing. I've successfully built Xpra using those stubs but when I run Xpra, I see this error in the logs:

2022-11-30 08:02:31,219 nvenc.init_module()
2022-11-30 08:02:31,219 NVENC encoder API version 12.0
2022-11-30 08:02:31,219 init_nvencode_library() will try to load libcuda.so
2022-11-30 08:02:31,219 init_nvencode_library() <bound method LibraryLoader.LoadLibrary of <ctypes.LibraryL
oader object at 0x7f4141093df0>>(libcuda.so)=<CDLL 'libcuda.so', handle 7f413030b690 at 0x7f413d173820>
2022-11-30 08:02:31,219 init_nvencode_library() libcuda.cuCtxGetCurrent=<_FuncPtr object at 0x7f413d0ee440>

2022-11-30 08:02:31,219 init_nvencode_library() will try to load libnvidia-encode.so.1
2022-11-30 08:02:31,220 failed to load 'libnvidia-encode.so.1'
Traceback (most recent call last):
  File "xpra/codecs/nvenc/encoder.pyx", line 1127, in xpra.codecs.nvenc.encoder.init_nvencode_library
  File "/usr/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvidia-encode.so.1: cannot open shared object file: No such file or directory
2022-11-30 08:02:31,222 nvenc.cleanup_module()
2022-11-30 08:02:31,222 call encoder clean_instance() waiting for lock
2022-11-30 08:02:31,223 call encoder clean_instance()

So, I'm missing libnvidia-encode still. I know it's not really your jurisdiction, but do you have any pointers on how to get this? How does Xpra's automated build system get it? I tried looking at repo-build-scripts but I couldn't really get it to work-- it doesn't seem like it's meant for building straight out of source anyway? (There needs to be a .tar.xz file containing the source somewhere).

SuperDisk commented 1 year ago

Ah, I found libnvidia-encode here: https://packages.ubuntu.com/jammy/libnvidia-encode

Seems you have to manually pick one of the packages that provide it.

What I'm running into now is this:

2022-11-30 09:42:27,551 nvEncOpenEncodeSessionEx(..)=15
2022-11-30 09:42:27,551 failed to test encoder with cuda_device_context(0 - True, has_context:True, instances_to_cleanup:0)
Traceback (most recent call last):
  File "xpra/codecs/nvenc/encoder.pyx", line 3427, in xpra.codecs.nvenc.encoder.init_module
  File "xpra/codecs/nvenc/encoder.pyx", line 3346, in xpra.codecs.nvenc.encoder.Encoder.open_encode_session_impl
Exception: cannot open encoding session: This indicates that an invalid struct version was used by the client., 0 contexts are in use
2022-11-30 09:42:27,551  device Quadro RTX 5000 is not supported: cannot open encoding session: This indicates that an invalid struct version was used by the client., 0 contexts are in use
2022-11-30 09:42:27,552 clean() cuda_context=cuda_device_context(0 - True, has_context:True, instances_to_cleanup:0), encoder context=0x0
2022-11-30 09:42:27,552 cuda_clean() actualClean=True
2022-11-30 09:42:27,552 buffer_clean()
2022-11-30 09:42:27,552 clean() done
2022-11-30 09:42:27,552 dealloc() on encoder base
2022-11-30 09:42:27,552 call encoder clean_instance() waiting for lock
2022-11-30 09:42:27,552 call encoder clean_instance()
2022-11-30 09:42:27,552 done dealloc()
2022-11-30 09:42:27,552 no valid NVENC devices found

Here's the output of my nvidia-smi

user@rec7nucamkq2vopcw:~$ nvidia-smi
Wed Nov 30 09:48:27 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02    Driver Version: 510.85.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 5000     Off  | 00000000:05:00.0 Off |                  Off |
| 33%   28C    P8    11W / 230W |      3MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

totaam commented 1 year ago

@SuperDisk all these questions have nothing to do with the OP's issue, please use a separate ticket for that in the future.

E: Package 'libnvidia-encode1' has no installation candidate

Looks like they've moved it: https://packages.ubuntu.com/search?suite=jammy&section=all&arch=any&keywords=libnvidia-encode&searchon=names

But that's just some stub I think, not the real thing.

Correct.

So, I'm missing libnvidia-encode still. (..) How does Xpra's automated build system get it?

It uses the stubs for building and we get them from the CUDA SDK to avoid this madness with package names on all the different distros supported. We always build using the latest SDK to support all the latest GPUs.

I tried looking at repo-build-scripts but I couldn't really get it to work-- it doesn't seem like it's meant for building straight out of source anyway

It is. You place your source in pkgs/ and run build. You must have installed the CUDA sdk in opt/ if you want the nvidia codecs.

Exception: cannot open encoding session: This indicates that an invalid struct version was used by the client., 0 contexts are in use

Perhaps the structs have changed and there is a mistmatch between the version of the stubs or headers and the version of libnvidia-encode you have installed. That's the price you pay for using proprietary out-of-tree drivers..