Threaded Lock Contention

Describe the bug Threaded Lock Contention while threaded-clean thread is cleaning. So that NVENC failed and fallback (see below)

Reason why I notice that: See https://github.com/orgs/Xpra-org/discussions/4343#discussioncomment-10554736 I set the parameter of --min-quality=100 on client side, but even that, the YUV420 downsampling will still appears so that I can not see rectangle clearly.

To Reproduce Steps to reproduce the behavior:

server command: xpra start :A_DISPLAY -d video,cuda
client command: Xpra_cmd.exe attach ssh://Server/A_DISPLAY --min-quality=100
specific action to trigger the bug: No, just run software do something.

System Information (please complete the following information):

Server OS: Rocky Linux 8
Client OS: e.g. Windows 11
Xpra Server Version 6.2.0
Xpra Client Version 6.2.0

Debug Code Change https://github.com/Xpra-org/xpra/blob/v6.2.x/xpra/codecs/nvidia/cuda/context.py#L524 To the following: (And import threading and also change slots)

    def __enter__(self):
        if not self.lock.acquire(False):
            log(f"Current thread: {threading.current_thread().name}")
            log(f"Lock is held by: {self.holding_thread.name}")
            raise TransientCodecException("failed to acquire cuda device lock")
        self.holding_thread = threading.current_thread()
        if not self.context:
            self.make_context()
        return self.push_context()

Additional context

// My debug Code Print
2024-10-19 20:45:06,699 Current thread: encode
2024-10-19 20:45:06,699 Lock is held by: threaded-clean
// End of my debug code
2024-10-19 20:45:06,700 Warning: setup_pipeline failed for
2024-10-19 20:45:06,700  (260, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 613, 673, nvjpeg(BGRX to jpeg)):
2024-10-19 20:45:06,700  failed to acquire cuda device lock
2024-10-19 20:45:06,700 setup_pipeline: trying (241, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 613, 673, nvenc(BGRX to h264))
2024-10-19 20:45:06,700 setup_pipeline: csc=None, video encoder=nvenc(BGRX/NV12/H264 - None -  613x673 ), info: {'version': (11, 0), 'device_count': 1, 'context_count': 3, 'generation': 31, 'cards': {0: {'name': b'NVIDIA GeForce RTX 4090', 'uuid': b'GPU-UUID', 'pci': {'domain': 0, 'bus': 3, 'device': 0, 'pci-device-id': SOMEID, 'pci-subsystem-id': SOMEID, 'bus-id': '00000000:03:00.0'}, 'memory': {'total': 25757220864, 'free': 24575606784, 'used': 1181614080}, 'pcie-link': {'generation-max': 4, 'width-max': 16, 'generation': 4, 'width': 8}, 'clock-info': {'graphics': 2625, 'sm': 2625, 'mem': 10251, 'graphics-max': 3165, 'sm-max': 3165, 'mem-max': 10501}, 'fan-speed': 30, 'temperature': 29, 'power-state': 2, 'vbios-version': b'95.02.3C.00.40'}}, 'kernel_module_version': (535, 183, 1), 'kernel_version': '5.15.0-119-generic', 'width': 613, 'height': 673, 'frames': 0, 'codec': 'H264', 'encoder_width': 640, 'encoder_height': 704, 'bitrate': 1000000, 'quality': 100, 'speed': 1, 'lossless': {'': 0, 'supported': 1, 'threshold': 100}, 'yuv444': {'supported': True, 'threshold': 85}, 'cuda-device': {}, 'cuda': {}, 'pycuda': {}, 'src_format': 'BGRX', 'pixel_format': 'NV12', 'total_time_ms': 0, 'free_memory': 0, 'total_memory': 0}, setup took 0.29ms
2024-10-19 20:45:06,700 video encoder nvenc(BGRX/NV12/H264 - None -  613x673 ) is not ready yet, using temporary fallback
2024-10-19 20:45:06,713 get_CUDA_function(BGRX_to_NV12) module=<pycuda._driver.Module object at 0x7f733a35f760>
2024-10-19 20:45:06,713 loading function 'BGRX_to_NV12' from pre-compiled cubin took 0.2ms

Extra Comment Well, I really want to have a way to avoid YUV420 with Xpra 6.x Windows Client. But now the H264 with YUV444 is not supported on Xpra 6.x Windows Client. Maybe I should consider port ffmpeg from 5.x branch? Or just use 5.x client with 6.x server?

Xpra-org / xpra

Threaded Lock Contention #4397