CESNET / UltraGrid

UltraGrid low-latency audio and video network transmission system
http://www.ultragrid.cz
Other
504 stars 53 forks source link

Full Range video signal clipped in yuv codecs with NVENC HEVC #244

Closed erichorwitz closed 1 year ago

erichorwitz commented 2 years ago

This was working in a previous version of the software. I believe we did most of our testing with Cuda 10 and Ultragrid 1.5. However I know we have a working system running cuda 11.0 and Ultragrid 1.6+ (tags/continuous rev a76979ab built Aug 18 2020 13:30:19)

Now however it seems that any yuv pixel format is clipping the full range signal.

Full range (0-1023) RGB 10bit 444 Input

lavc-use-codec

Sender Settings: uv -t decklink:codec=R10k -c libavcodec:encoder=hevc_nvenc:bitrate=15M:GOP=48 -s embedded --audio-codec=AAC --audio-capture-format channels=2 -f rs:200:240 -f A:mult:2 --verbose -m 1390 192.168.0.11 -P 5004:5004:5006:5006 --param lavc-use-codec=yuv444p16le --control-port=8001 --param control-accept-global --verbose

Receiver Settings: uv -d decklink:device=0 -r embedded -m 1390 --param force-lavd-decoder=hevc_cuvid --param decoder-use-codec=R10k 192.168.0.10

jonathan-catt commented 2 years ago

Following up on Eric's post, here are the two scope readings showing the YUV pixel formats clipping while the RGB formats do not.

any of the YUV Pix Formats Clipping IMG_6227

rbg0/bgr0 Pix Formats not clipped but limited to 8bit IMG_6226

erichorwitz commented 2 years ago

Just tested with latest continuous app image builds and with R12L.. This works correctly in FULL range....

alatteri commented 2 years ago

There have been some recent changes to accommodate strange things FFmpeg has done to nvenc* in the recent builds, and also to accommodate the fact that BMD SDK does not output Full Range when using R10K output. For instance, Flame (with BMD out) at 10bit444 can only output Limited (BMD problem, not inherent to Flame). And yet, Resolve must be using a private function as its 10bit444 out can be Full. Very recent builds of UG are assuming that R10K input is limited and internally expands it to Full, so that is probably why you are seeing clipping, it would be expanding a signal that is already 0-1023.

You can follow this thread of thought here: https://github.com/CESNET/UltraGrid/discussions/241

I have not had time to test the recent (week or so builds) which include this new functionality.

MartinPulec commented 2 years ago

This is correct, UltraGrid uses limitted range for YCbCr formats and full for RGB. However I don't think that there has been some change. But keep in mind that this applies only for YCbCr<->RGB conversions, if keeping the color space, the range is not touched in any way.

MartinPulec commented 2 years ago

I am sorry, I've written the previous commit in rush yesterday and I didn't properly read the discussion. Alan is right, this has been changed with 144997a2 and the fix is present in continuous builds. Anyways, as the R10k signal is be now put to BMD API in a limited range, I assume that DeckLink would output it so. But without clipping.

erichorwitz commented 1 year ago

I started to attempt to test the bmd-r10k-full-range param but I am running into an error with the latest appimage build. Worked on a previous appimage build that did not have the bmd-r10k-full-range option.


~/UltraGrid-continuous-x86_64.AppImage -o uv -t decklink:codec=R10K -c libavcodec:encoder=hevc_nvenc:bitrate=10M:GOP=24 -s embedded --audio-codec=PCM --audio-capture-format channels=2 -f rs:192:240 -f A:mult:2 --verbose -m 1420 192.168.100.19 -P 5004:5004:5006:5006 --param lavc-use-codec=yuv444p16le --control-port=8001 --param control-accept-global
UltraGrid 1.7+ (master rev cee4f37 built Dec  1 2022 09:57:46)

Display device   : none
Capture device   : decklink
Audio capture    : embedded
Audio playback   : none
MTU              : 1420 B
Video compression: libavcodec:encoder=hevc_nvenc:bitrate=10M:GOP=24
Audio codec      : PCM
Network protocol : UltraGrid RTP
Audio FEC        : mult:2
Video FEC        : rs:192:240

[1669959353.150] [NAT] Private outbound IPv4 address detected and binding as a receiver. Consider adding '-N' option for NAT traversal.
[1669959353.150] Connected IP version 6
    Last message repeated 1 times
[1669959353.155] Socket recv buffer size set to 524288 B.
[1669959353.168] [Decklink capture] Using codec: R10k
[1669959353.168] [DeckLink capture] Using limited range R10k as specified by BMD, use '--param bmd-r10k-full-range' to override.
[1669959353.169] [DeckLink capture] Using device DeckLink Mini Recorder 4K
[1669959353.169] [DeckLink capture] Unable to set bmdDeckLinkConfigCapturePassThroughMode: not implemented (0x80000001)
[1669959353.169] The desired display mode is supported: 1080p23.98
[1669959353.169] [DeckLink capture] Enable video input: 1080p23.98
[1669959353.169] [DeckLink] Trying to autodetect format.
[1669959353.184] [DeckLink capture] Audio input set to: embedded
[1669959353.185] [DeckLink capture] EnableAudioInput: Decklink audio capture initialized sucessfully: 2 channels, 4 Bps, 48000 Hz, codec: PCM
[1669959353.192] Control socket listening on port 8001
[1669959353.201] Connected IP version 6
    Last message repeated 1 times
[1669959353.206] Socket recv buffer size set to 18247680 B.
[1669959353.211] [control] Fec changed successfully
Audio sending started.
[1669959353.281] Frame received (#0) - No input signal detected
[1669959353.281] [Decklink capture] Format change detected (color space - YCbCr422, 10bit).
[1669959353.281] [Decklink capture] Using codec: v210
[1669959353.281] [DeckLink capture] Enable video input: 1080p23.98
[1669959353.364] Waiting for new frame timed out!
[1669959353.458] [lavc] Using codec: H.265, encoder: hevc_nvenc
[1669959353.474] [lavc] Setting bitrate to 10000.0 kbps.
[1669959353.474] [lavc] Setting NVENC preset to p4.
[1669959353.478] [lavc] Slice-based or external multithreading not available, encoding won't be parallel. You may select frame-based paralellism if needed.
[1669959353.478] [lavc] Trying pixfmt: yuv444p16le
[1669959353.487] [lavc hevc_nvenc @ 0x7f349c000d00] Loaded Nvenc version 11.0
[1669959353.487] [lavc hevc_nvenc @ 0x7f349c000d00] Nvenc initialized successfully
[1669959353.531] [lavc hevc_nvenc @ 0x7f349c000d00] 1 CUDA capable devices found
[1669959353.531] [lavc hevc_nvenc @ 0x7f349c000d00] [ GPU #0 - < Quadro P400 > has Compute SM 6.1 ]
[1669959353.652] [lavc hevc_nvenc @ 0x7f349c000d00] B frames as references are not supported
[1669959353.684] [lavc hevc_nvenc @ 0x7f349c000d00] No capable devices found
[1669959353.684] [lavc hevc_nvenc @ 0x7f349c000d00] Nvenc unloaded
[1669959353.684] [lavc] Could not open codec for pixel format yuv444p16le
[lavc] Codec supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 bgra rgb0 rgba x2rgb10le x2bgr10le gbrp gbrp16le cuda
[lavc] Usable pixel formats: yuv444p16le
[1669959353.684] [lavc] No direct decoder format for: v210. Trying to convert with swscale instead.
[1669959353.684] [lavc] Setting bitrate to 10000.0 kbps.
[1669959353.684] [lavc] Setting NVENC preset to p4.
[1669959353.684] [lavc] Slice-based or external multithreading not available, encoding won't be parallel. You may select frame-based paralellism if needed.
[1669959353.684] [lavc] Trying pixfmt: yuv420p
[1669959353.685] [lavc hevc_nvenc @ 0x7f349c5d6080] Loaded Nvenc version 11.0
[1669959353.685] [lavc hevc_nvenc @ 0x7f349c5d6080] Nvenc initialized successfully
[1669959353.685] [lavc hevc_nvenc @ 0x7f349c5d6080] 1 CUDA capable devices found
[1669959353.685] [lavc hevc_nvenc @ 0x7f349c5d6080] [ GPU #0 - < Quadro P400 > has Compute SM 6.1 ]
[1669959353.800] [lavc hevc_nvenc @ 0x7f349c5d6080] B frames as references are not supported
[1669959353.859] [DeckLink capture] Dropping audio packet, queue full.
[1669959353.874] [lavc hevc_nvenc @ 0x7f349c5d6080] No capable devices found
[1669959353.874] [lavc hevc_nvenc @ 0x7f349c5d6080] Nvenc unloaded
[1669959353.874] [lavc] Could not open codec for pixel format yuv420p
[1669959353.874] [lavc] Setting bitrate to 10000.0 kbps.
[1669959353.874] [lavc] Setting NVENC preset to p4.
[1669959353.874] [lavc] Slice-based or external multithreading not available, encoding won't be parallel. You may select frame-based paralellism if needed.
[1669959353.875] [lavc] Trying pixfmt: nv12
[1669959353.875] [lavc hevc_nvenc @ 0x7f349c02f800] Loaded Nvenc version 11.0
[1669959353.875] [lavc hevc_nvenc @ 0x7f349c02f800] Nvenc initialized successfully
[1669959353.875] [lavc hevc_nvenc @ 0x7f349c02f800] 1 CUDA capable devices found
[1669959353.876] [lavc hevc_nvenc @ 0x7f349c02f800] [ GPU #0 - < Quadro P400 > has Compute SM 6.1 ]
[1669959353.901] [DeckLink capture] Dropping audio packet, queue full.
    Last message repeated 4 times
[1669959354.095] [lavc hevc_nvenc @ 0x7f349c02f800] B frames as references are not supported
[1669959354.110] [DeckLink capture] Dropping audio packet, queue full.
    Last message repeated 1 times

Working:

~/UltraGrid-continuous-x86_64.AppImage2 -o uv -t decklink:codec=R10K -c libavcodec:encoder=hevc_nvenc:bitrate=10M:GOP=24 -s embedded --audio-codec=PCM --audio-capture-format channels=2 -f rs:192:240 -f A:mult:2 --verbose -m 1420 192.168.100.19 -P 5004:5004:5006:5006 --param lavc-use-codec=yuv444p16le --control-port=8001 --param control-accept-global
UltraGrid 1.7+ (tags/continuous rev dd9a53cf built Aug 16 2022 13:32:17)

Display device   : none
Capture device   : decklink
Audio capture    : embedded
Audio playback   : none
MTU              : 1420 B
Video compression: libavcodec:encoder=hevc_nvenc:bitrate=10M:GOP=24
Audio codec      : PCM
Network protocol : UltraGrid RTP
Audio FEC        : mult:2
Video FEC        : rs:192:240

[1669959652.699] [NAT] Private outbound IPv4 address detected and binding as a receiver. Consider adding '-N' option for NAT traversal.
[1669959652.702] Connected IP version 6
Created new RTP session with SSRC 0x14d28623.

[1669959652.706] Socket recv buffer size set to 524288 B.
Display initialized-none
[1669959652.720] Using device DeckLink Mini Recorder 4K
[1669959652.720] [DeckLink capture] Unable to set bmdDeckLinkConfigCapturePassThroughMode: not implemented (0x80000001)
[1669959652.720] The desired display mode is supported: 1080p23.98
Enable video input: 1080p23.98
[1669959652.720] [DeckLink] Trying to autodetect format.
[Decklink capture] Audio input set to: embedded
[1669959652.735] [DeckLink capture] EnableAudioInput: Decklink audio capture initialized sucessfully: 2 channels, 4 Bps, 48000 Hz, codec: PCM
Start capture
DeckLink capture device enabled
Video capture initialized-decklink
[1669959652.752] Connected IP version 6
Created new RTP session with SSRC 0x15725f8a.

[1669959652.757] Socket recv buffer size set to 18247680 B.
[1669959652.761] [control] Fec changed successfully
Audio sending started.
[1669959652.826] Frame received (#0) - No input signal detected
[1669959652.826] [Decklink capture] Format change detected (color space).
[1669959652.826] [Decklink capture] Using codec: v210
Enable video input: 1080p23.98
[1669959652.907] Waiting for new frame timed out!
[1669959652.995] [lavc] Using codec: H.265, encoder: hevc_nvenc
[1669959653.008] [lavc] Setting bitrate to 10000.0 kbps.
[1669959653.008] [lavc] Setting NVENC preset to p7.
[1669959653.008] [lavc] Slice-based or external multithreading not available, encoding won't be parallel. You may select frame-based paralellism if needed.
[1669959653.008] [lavc] Trying pixfmt: yuv444p16le
[hevc_nvenc @ 0x7fd9a4000d00] Loaded Nvenc version 11.0
[hevc_nvenc @ 0x7fd9a4000d00] Nvenc initialized successfully
[hevc_nvenc @ 0x7fd9a4000d00] 1 CUDA capable devices found
[hevc_nvenc @ 0x7fd9a4000d00] [ GPU #0 - < Quadro P400 > has Compute SM 6.1 ]
[hevc_nvenc @ 0x7fd9a4000d00] supports NVENC
[lavc] Codec supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 bgra rgb0 rgba x2rgb10le x2bgr10le gbrp gbrp16le cuda
[lavc] Usable pixel formats: yuv444p16le
[1669959653.221] [lavc] Codec hevc_nvenc capabilities: 0x00240022 using thread type 0, count 1
[1669959653.221] [lavc] Selected pixfmt: yuv444p16le
[1669959653.221] [lavc] Selected pixfmt has not 4:2:0 subsampling, which is usually not supported by hw. decoders
[1669959653.231] [DeckLink] Audio frame too small!
    Last message repeated 3 times
[1669959653.244] FEC symbol size: 9, symbols per packet: 151, payload size: 1359
[1669959653.249] FEC symbol size: 5, symbols per packet: 272, payload size: 1360
[1669959657.038] [Audio sender] Sent 194194 samples in last 5.010000 seconds.
[1669959657.040] [Audio sender] Volume: -inf/-inf -inf/-inf dBFS RMS/peak
[1669959657.787] [decklink] 112 frames in 5.02101 seconds = 22.3063 FPS
[1669959662.043] [Audio sender] Sent 240240 samples in last 5.005326 seconds.
[1669959662.046] [Audio sender] Volume: -inf/-inf -inf/-inf dBFS RMS/peak
[1669959662.792] [decklink] 120 frames in 5.00499 seconds = 23.9761 FPS```
erichorwitz commented 1 year ago

I was able to run the sep 30th build:


~/UltraGrid-continuous-x86_64_20221001.AppImage -o uv -t decklink:codec=R10K -c libavcodec:encoder=hevc_nvenc:bitrate=10M:GOP=24 -s embedded --audio-codec=PCM --audio-capture-format channels=2 -f rs:192:240 -f A:mult:2 --verbose -m 1420 192.168.100.19 -P 5004:5004:5006:5006 --param lavc-use-codec=yuv444p16le --control-port=8001 --param control-accept-global --param bmd-r10k-full-range
UltraGrid 1.7+ (tags/continuous rev 928249cf built Sep 30 2022 12:04:16)

Display device   : none
Capture device   : decklink
Audio capture    : embedded
Audio playback   : none
MTU              : 1420 B
Video compression: libavcodec:encoder=hevc_nvenc:bitrate=10M:GOP=24
Audio codec      : PCM
Network protocol : UltraGrid RTP
Audio FEC        : mult:2
Video FEC        : rs:192:240

[1669960172.015] [NAT] Private outbound IPv4 address detected and binding as a receiver. Consider adding '-N' option for NAT traversal.
[1669960172.018] Connected IP version 6
Created new RTP session with SSRC 0x31ad28d2.

[1669960172.019] Socket recv buffer size set to 524288 B.
Display initialized-none
[1669960172.031] [Decklink capture] Using codec: R10k
[1669960172.032] [DeckLink capture] Using device DeckLink Mini Recorder 4K
[1669960172.032] [DeckLink capture] Unable to set bmdDeckLinkConfigCapturePassThroughMode: not implemented (0x80000001)
[1669960172.032] The desired display mode is supported: 1080p23.98
[1669960172.032] [DeckLink capture] Enable video input: 1080p23.98
[1669960172.032] [DeckLink] Trying to autodetect format.
[1669960172.046] [DeckLink capture] Audio input set to: embedded
[1669960172.047] [DeckLink capture] EnableAudioInput: Decklink audio capture initialized sucessfully: 2 channels, 4 Bps, 48000 Hz, codec: PCM
Video capture initialized-decklink
[1669960172.062] Connected IP version 6
Created new RTP session with SSRC 0x4cf0c39f.

[1669960172.067] Socket recv buffer size set to 18247680 B.
[1669960172.072] [control] Fec changed successfully
Audio sending started.
[1669960172.134] Frame received (#0) - No input signal detected
[1669960172.134] [Decklink capture] Format change detected (color space - YCbCr422, 10bit).
[1669960172.134] [Decklink capture] Using codec: v210
[1669960172.134] [DeckLink capture] Enable video input: 1080p23.98
[1669960172.215] Waiting for new frame timed out!
[1669960172.307] [lavc] Using codec: H.265, encoder: hevc_nvenc
[1669960172.323] [lavc] Setting bitrate to 10000.0 kbps.
[1669960172.323] [lavc] Setting NVENC preset to p7.
[1669960172.326] [lavc] Slice-based or external multithreading not available, encoding won't be parallel. You may select frame-based paralellism if needed.
[1669960172.326] [lavc] Trying pixfmt: yuv444p16le
[1669960172.338] [lavc hevc_nvenc @ 0x7f27c0000d00] Loaded Nvenc version 11.0
[1669960172.338] [lavc hevc_nvenc @ 0x7f27c0000d00] Nvenc initialized successfully
[1669960172.384] [lavc hevc_nvenc @ 0x7f27c0000d00] 1 CUDA capable devices found
[1669960172.384] [lavc hevc_nvenc @ 0x7f27c0000d00] [ GPU #0 - < Quadro P400 > has Compute SM 6.1 ]
[1669960172.497] [lavc hevc_nvenc @ 0x7f27c0000d00] supports NVENC
[lavc] Codec supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 bgra rgb0 rgba x2rgb10le x2bgr10le gbrp gbrp16le cuda
[lavc] Usable pixel formats: yuv444p16le
[1669960172.542] [lavc] Codec hevc_nvenc capabilities: 0x00240022 using thread type 0, count 1
[1669960172.542] [lavc] Selected pixfmt: yuv444p16le
[1669960172.542] [lavc] Selected pixfmt has not 4:2:0 subsampling, which is usually not supported by hw. decoders
[1669960172.551] [DeckLink] Audio frame too small!
    Last message repeated 4 times
[1669960172.563] FEC symbol size: 9, symbols per packet: 151, payload size: 1359
[1669960176.096] [Audio sender] Sent 180179 samples in last 5.005789 seconds.
[1669960176.098] [Audio sender] Volume: -inf/-inf -inf/-inf dBFS RMS/peak
[1669960177.095] [DeckLink capture] 111 frames in 5.01872 seconds = 22.1172 FPS
[1669960181.101] [Audio sender] Sent 240240 samples in last 5.004788 seconds.
[1669960181.104] [Audio sender] Volume: -inf/-inf -inf/-inf dBFS RMS/peak
[1669960182.100] [DeckLink capture] 120 frames in 5.00496 seconds = 23.9762 FPS
[1669960186.106] [Audio sender] Sent 240240 samples in last 5.005080 seconds.
[1669960186.109] [Audio sender] Volume: -inf/-inf -inf/-inf dBFS RMS/peak
[1669960187.105] [DeckLink capture] 120 frames in 5.00501 seconds = 23.976 FPS
[1669960191.111] [Audio sender] Sent 240240 samples in last 5.004910 seconds.
[1669960191.114] [Audio sender] Volume: -inf/-inf -inf/-inf dBFS RMS/peak
[1669960192.110] [DeckLink capture] 120 frames in 5.00495 seconds = 23.9763 FPS
erichorwitz commented 1 year ago

the Nov 4th build runs:


~/UltraGrid-continuous-x86_64_20221105.AppImage -o uv -t decklink:codec=R10K -c libavcodec:encoder=hevc_nvenc:bitrate=10M:GOP=24 -s embedded --audio-codec=PCM --audio-capture-format channels=2 -f rs:192:240 -f A:mult:2 --verbose -m 1420 192.168.100.19 -P 5004:5004:5006:5006 --param lavc-use-codec=yuv444p16le --control-port=8001 --param control-accept-global --param bmd-r10k-full-range
UltraGrid 1.7+ (master rev 9de21a2 built Nov  4 2022 16:28:16)

Display device   : none
Capture device   : decklink
Audio capture    : embedded
Audio playback   : none
MTU              : 1420 B
Video compression: libavcodec:encoder=hevc_nvenc:bitrate=10M:GOP=24
Audio codec      : PCM
Network protocol : UltraGrid RTP
Audio FEC        : mult:2
Video FEC        : rs:192:240

[1669960548.550] [NAT] Private outbound IPv4 address detected and binding as a receiver. Consider adding '-N' option for NAT traversal.
[1669960548.553] Connected IP version 6
    Last message repeated 1 times
[1669960548.555] Socket recv buffer size set to 524288 B.
[1669960548.566] [Decklink capture] Using codec: R10k
[1669960548.567] [DeckLink capture] Using device DeckLink Mini Recorder 4K
[1669960548.567] [DeckLink capture] Unable to set bmdDeckLinkConfigCapturePassThroughMode: not implemented (0x80000001)
[1669960548.567] The desired display mode is supported: 1080p23.98
[1669960548.567] [DeckLink capture] Enable video input: 1080p23.98
[1669960548.567] [DeckLink] Trying to autodetect format.
[1669960548.582] [DeckLink capture] Audio input set to: embedded
[1669960548.583] [DeckLink capture] EnableAudioInput: Decklink audio capture initialized sucessfully: 2 channels, 4 Bps, 48000 Hz, codec: PCM
[1669960548.590] Control socket listening on port 8001
[1669960548.596] Connected IP version 6
    Last message repeated 1 times
[1669960548.601] Socket recv buffer size set to 18247680 B.
[1669960548.606] [control] Fec changed successfully
Audio sending started.
[1669960548.674] Frame received (#0) - No input signal detected
[1669960548.674] [Decklink capture] Format change detected (color space - YCbCr422, 10bit).
[1669960548.674] [Decklink capture] Using codec: v210
[1669960548.674] [DeckLink capture] Enable video input: 1080p23.98
[1669960548.756] Waiting for new frame timed out!
[1669960548.850] [lavc] Using codec: H.265, encoder: hevc_nvenc
[1669960548.865] [lavc] Setting bitrate to 10000.0 kbps.
[1669960548.865] [lavc] Setting NVENC preset to p7.
[1669960548.866] [lavc] Slice-based or external multithreading not available, encoding won't be parallel. You may select frame-based paralellism if needed.
[1669960548.866] [lavc] Trying pixfmt: yuv444p16le
[1669960548.875] [lavc hevc_nvenc @ 0x7f10c0000d00] Loaded Nvenc version 11.0
[1669960548.875] [lavc hevc_nvenc @ 0x7f10c0000d00] Nvenc initialized successfully
[1669960548.919] [lavc hevc_nvenc @ 0x7f10c0000d00] 1 CUDA capable devices found
[1669960548.919] [lavc hevc_nvenc @ 0x7f10c0000d00] [ GPU #0 - < Quadro P400 > has Compute SM 6.1 ]
[1669960549.036] [lavc hevc_nvenc @ 0x7f10c0000d00] supports NVENC
[lavc] Codec supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 bgra rgb0 rgba x2rgb10le x2bgr10le gbrp gbrp16le cuda
[lavc] Usable pixel formats: yuv444p16le
[1669960549.081] [lavc] Codec hevc_nvenc capabilities: 0x00240022 using thread type 0, count 1
[1669960549.081] [lavc] Selected pixfmt: yuv444p16le
[1669960549.081] [lavc] Selected pixfmt has not 4:2:0 subsampling, which is usually not supported by hw. decoders
[1669960549.091] [DeckLink] Audio frame too small!
    Last message repeated 4 times
[1669960549.103] FEC symbol size: 9, symbols per packet: 151, payload size: 1359
[1669960552.721] [Audio sender] Sent 184184 samples in last 5.004059 seconds.
[1669960552.722] [Audio sender] Volume: -inf/-inf -inf/-inf dBFS RMS/peak
[1669960553.636] [DeckLink capture] 111 frames in 5.02616 seconds = 22.0844 FPS
[1669960557.726] [Audio sender] Sent 240240 samples in last 5.004999 seconds.
[1669960557.730] [Audio sender] Volume: -inf/-inf -inf/-inf dBFS RMS/peak
alatteri commented 1 year ago

Hi Eric,

A few thoughts... there has been a decent amount of color space work happening over the past few months, which is why you are seeing a difference in behaviour from the 2 builds you are testing with.

  1. While I don't use NVENC since BMD is broken with 12bit 12G SDI, I always specify decklink=R12L to ensure proper Full-Range video.
  2. I do not believe it is necessary to specify lavc-use-codec=yuv444p16le anymore. Try letting UG auto choose, which I believe will now do gbrp16le inherently, which will give you the best color.
  3. If you are doing HD resolutions only, it's a bit of overkill utilizing NVENC, x265 can do this easily, and will have much better color accuracy as the path will stay RGB 12bit the whole way thru. NVENC will be 10bit YUV encode. Additionally with NVENC, the decoding requirements are higher, while x265 decode will be parallel. If I recall, NVENC also was more latent that x265 at HD.
  4. You won't get the GOP pulse with x265, you will with NVENC as FFMPEG implementation of NVENC does not support intra-refresh
  5. the defaults for x265 are now very good
  6. if color critical accuracy isn't an issue, use x264 for the lowest latency, and even then you can still get 10bit YUV 444, the decode requirements are super low, you probably could even use RaspiPi4 as receiver.

Alan

erichorwitz commented 1 year ago

unfortunately the cpus I have on the encoders and decoders do not seem to be able to keep up and I get a lot of dropped frames and glitches... Although it has been a very very long time since I tried only cpu encoding/decoding...

alatteri commented 1 year ago

unfortunately the cpus I have on the encoders and decoders do not seem to be able to keep up and I get a lot of dropped frames and glitches... Although it has been a very very long time since I tried only cpu encoding/decoding...

I've found that on same hardware, Ubuntu22 with recent UG is much faster than previous versions. But really, if your workload isn't color critical, just switch to H264, the color would be good enough, and everything else would be much easier.

Also for reasons I do not understand, I've found FEC to be very effective with H264, and pretty much useless for H265 which is why I encapsulate in SRT, which then adds varying latencies.

alatteri commented 1 year ago

Also, some distributions use the "powersave" CPU governor by default, which keeps the CPU in the lowest performance lowest MHz state.
Check with this cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Best governor for this use case is performance , if heat/noise is an issue, an OK compromise is ondemand or the newer schedutil

and to squeeze out even more performance, on modern kernels, run with mitigations=off

MartinPulec commented 1 year ago

Hi @erichorwitz, I've create an issue #273 because it is a separate problem so please refer to it there.

Some other remarks to the above:

@erichorwitz I don't know if you noticed, but the detected signal in all runs above falls to v210, so tweaking full/limited range won't do anything here. I think that RGB/YCbCr format decision cannot be forced to DeckLink, it takes what has on input. Are you sure that the signal is in RGB? If so, DeckLink must have made a mistake with the detection. As I said, I think that the signal cannot be enforced, but I am not 100% sure.

@alatteri:

  1. correct, provided that gbrp16le is present and usable. But it should be regardless the device - I believe, that for that RGB pixel format there is implemented a conversion to YCbCr in CUDA /or alike/ so it is rather a matter of a driver than the encoding chip (I've tested it works with GP102).
  2. the part about x265 is correct. Intra-refresh is actually implemented for NVENC, even in FFmpeg, but differently than x265. I still don't know how x264/x265 does that because the pulse in NVENC is IMHO caused by insertion of IDR frame (which are unfortunately almost essential for UG), which should invalidate all previous references... Seemingly this doesn't hold for x264/5.
alatteri commented 1 year ago

I found this in nvenc.c in ffmpeg

"Single slice intra refresh needs SDK 11.1 at build time\n");

I believe that UG is currently being built against SDK 11.0. Maybe updating to a newer SDK will fix the current NVENC GOP pulsing?

3.Intra-refresh is actually implemented for NVENC, even in FFmpeg, but differently than x265. I still don't know how x264/x265 does that because the pulse in NVENC is IMHO caused by insertion of IDR frame (which are unfortunately almost essential for UG), which should invalidate all previous references... Seemingly this doesn't hold for x264/5.

MartinPulec commented 1 year ago

Maybe updating to a newer SDK will fix the current NVENC GOP pulsing?

Unfortunately no, --param lavc-rc-buffer-size-factor=0 would. But the price for this is that the IDR (which are still I-) frames would become eg. 6-times higher than the other. So with a bitrate 20 Mbps, the peak bitrate will be 120 Mbps. I am not saying that it is unusable - it depends whether you are on a broad link and you are just conserving a bandwidth or a exposed network element can buffer that frame.

I am afraid that there is no "good" solution here - either the stream will pulsate or there will be bitrate peaks.

UPDATE: just FYI, the slice intra refresh is described here - it basically means that it reduces slice count to 1 while doing intra refresh (the intra refresh wave generally doesn't need to run all times, there can be "ordinary" p-frames inbetween).

MartinPulec commented 1 year ago

I am closing this issue now. Reading back the discussion, the original issue was perhaps already resolved/clarified? Next there was mentioned NVENC pulsation, which is probably a persisting issue. It is unsure if possible to solve at the same time as keeping the video bitrate strictly rate-limited. But it was slightly off-topic to the original issue, so it would be perhaps better to create a new issue if still needed.