CESNET / UltraGrid

UltraGrid low-latency audio and video network transmission system
http://www.ultragrid.cz
Other
504 stars 53 forks source link

10-Bit 4:2:2 Question #212

Closed TheSashmo closed 2 years ago

TheSashmo commented 2 years ago

I have been using UG for a while now, and it works extremely well, but only in 8-Bit 4:2:0. I can encode and decode easily and CPU usage is like 20-25% with a load average of 2.0

When I switch to 10-Bit 4:2:2, I can see encode works fine without any dropped frames, but at the decoder I can see 1-4 frames dropped constantly. The CPU load is barely 50% with load average of 3.8.

Any suggestions?

MartinPulec commented 2 years ago

Hi, what is exactly the setup? Hard to say without details.

TheSashmo commented 2 years ago

So here is my setup:

Two machines: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz

Basic encode setup, all in auto mode: /usr/bin/uv -t decklink:connection=SDI:device=0 -c libavcodec:codec=H.264:encoder=libx264:bitrate=20000k -s embedded --audio-codec=MP3:sample_rate=48000:bitrate=128k --audio-capture-format channels=8 -l unlimited -m 1316 -P 50000 127.0.0.1

Basic decode setup: /usr/bin/uv -d decklink:device=0 -r embedded --control-port=2136 127.0.0.1 -P 50000

Testing on a local network. The headache portion for me is that the frames encoded vs. decoded are fine, but when I watch the decoded video I can clearly see during horizontal motion in the video, it skips frames. But the logs don't say anything wrong. On top of the headache, I don't see it all the time in all the content. I am thinking that it's something to do with h.264 encoding and motion vectors, but I am really pulling my hair out on this. It seems when I change the preset option in h264 that it gets better, but the load average is still very very low, so changing the preset dosnt make senes to me as there is a lot of horsepower left over to handle the encode. The decode barely even registers on the CPU, but I can clearly see skipped frames and decklink display is proper......

MartinPulec commented 2 years ago

I see, would it be possible to debug it more by appending:

--verbose=7 2>&1 | grep "Decompress duration"

to the decoder? I've tried with sample 1920p image (feeded with testcard) and I got somehing like 4.5±0.5 ms for 10-bit v210 and 2.5±0.5 for UYVY. Which is quite significant difference and depending on the content/CPU/resolution/etc. if the number is an order of magnitude higher, it can be the reason. If, on the other hand it would be significantly lower than frame-time, there could be something different.

TheSashmo commented 2 years ago

I am seeing much more than .5 variance...

Localhost machine, same machine encode and decode: https://pastebin.com/jTqvaC85 One machine encode, network, another machine decode: https://pastebin.com/RudzHv7v

Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz

`/usr/bin/uv -t decklink:connection=SDI:device=0:codec=UYVY -c libavcodec:codec=H.264:encoder=libx264:bitrate=20000k:preset=superfast:subsampling=420 -s embedded --audio-codec=MP3:sample_rate=48000:bitrate=128k --audio-capture-format channels=8 -l unlimited -m 1316 -P 50000 127.0.0.1 UltraGrid binary older than 90 days, consider checking updates:

/usr/bin/uv -u

Hint: you can set environment variable ULTRAGRID_AUTOUPDATE to 1 for automatic update or 0 to suppress the above message.

UltraGrid 1.6+ (tags/continuous rev d7da9c6f built Jul 2 2021 15:40:48)

Display device : none Capture device : decklink Audio capture : embedded Audio playback : none MTU : 1316 B Video compression: libavcodec:codec=H.264:encoder=libx264:bitrate=20000k:preset=superfast:subsampling=420 Audio codec : MP3 Network protocol : UltraGrid RTP Audio FEC : none Video FEC : none

[lavcd aud.] Using audio encoder: libmp3lame Created new RTP session with SSRC 0x6c65fab8. Display initialized-none Using device DeckLink SDI Micro [DeckLink capture] bmdDeckLinkConfigVideoInputConnection: Input set to: 1 [DeckLink capture] Unable to set conversion mode: not implemented [DeckLink capture] Setting single link by default. The desired display mode is supported: 525i59.94 NTSC Enable video input: 525i59.94 NTSC [DeckLink] Trying to autodetect format. [Decklink capture] Audio input set to: embedded [DeckLink capture] EnableAudioInput: Decklink audio capture initialized sucessfully: 8 channels, 4 Bps, 48000 Hz, codec: PCM Start capture DeckLink capture device enabled Video capture initialized-decklink Created new RTP session with SSRC 0x74c7e0e3. Audio sending started. Frame received (#0) - No input signal detected [Decklink capture] Format change detected (color space). [Decklink capture] Using codec: UYVY Enable video input: 525i59.94 NTSC Frame received (#0) - No input signal detected [Decklink capture] Format change detected (color space). [lavcd aud.] Using audio encoder: libmp3lame Last message repeated 6 times Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Last message repeated 1 times Frame received (#0) - No input signal detected Last message repeated 4 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 1 times [Decklink capture] Format change detected (display mode, color space). [Decklink capture] Using codec: UYVY Enable video input: 1080i59.94 Frame received (#0) - No input signal detected [Decklink capture] Format change detected (color space). Frame received (#0) - No input signal detected Last message repeated 1 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 2 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected Last message repeated 4 times [DeckLink] Audio frame too small! Frame received (#0) - No input signal detected [Decklink capture] Format change detected (color space). [lavc] Using codec: H.264, encoder: libx264 [lavc] Setting bitrate to 20000000 bps. [lavc] Warning: Codec doesn't support slice-based multithreading. [libx264 @ 0x7f8ac4001bc0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 [libx264 @ 0x7f8ac4001bc0] profile High, level 4.0, 4:2:0, 8-bit [lavc] Selected pixfmt: yuv420p`

MartinPulec commented 2 years ago

I am seeing much more than .5 variance...

Sure, this is mostly content-dependent. I've just tried with ShakeNDry and the variance was also somehow higher. (I can provide commands to replicate if you wish to have the same reference.)

However, would it be possible to paste the results for 10-bit instead? I'd say that those values are more or less acceptable, or do you experience the glitches also for those?

TheSashmo commented 2 years ago

I was just going by the message in your previous post about that it should be -+ a certain amount. To be honest that is not what bothers me. It seems like there is so much CPU left over and I can visually see skipping in the frames of the output content. I was using a hockey game as my reference, as the issue is very noticeable during left and right motion, you could see the player skip ahead 1-2 frames like the encoder or the decoder couldn't keep up with the encoding.

I would be more than happy to try your example. Please share.

Let me setup the 10 bit results and send it back over to you.

TheSashmo commented 2 years ago

So in my current setup, same CPU, that I listed, it can't do 150 frames in 5 seconds. But the load average is 4 on a 12 core box. Am I missing something?

TheSashmo commented 2 years ago

OK I am able to set my preset down one notch and get a consistant "almost" rate of encoding, but my decoder is a different story. Decklink reports 150 frames, and CPU is negligible at 1.5 load average but visually I can see skipping frames. Here is the decompress duration: https://pastebin.com/xPUgUnSy

TheSashmo commented 2 years ago

OK So I think I found my issue. Since I have a lower powered machine I was chasing the best test setup, and found that by forcing to 8 bit 4:2:0 I was getting better results, but not perfect, which lead me down the path of setting the presets to the lowest available i.e. ultrafast. The issue with this, is that forces the encoding profile to baseline, and thats just too low in quality.

During my testing, I decided to just leave the format detection to automatic and let it do its own thing, but only change the preset, to ultrafast, and now I have smooth video encoding and decoding, and all seems to be just fine now.

I would still like to know how you do your test with that sample file you use.

MartinPulec commented 2 years ago

Ok, perfect, thanks for the info.

Well, I originally used raw yuv + ImageMagic + UG conversion (UYVY->v210, because v210 is unknown to IM). However, while pasting the output, I've realized this was a bit complicated. So I simplified it with FFmpeg that should be able to do the conversion to v210 directly.

So assuming downloaded and un-7zipped Y4M file in 3840x2160 (FullHD was not available), run following script:

#!/bin/sh -eu
W=1920
H=1080
INFILE=${1-ShakeNDry_3840x2160_120fps_420_10bit_YUV.y4m}
mkdir out
ffmpeg -i $INFILE -vcodec v210 -f rawvideo -vf scale=${W}x${H} out/src.yuv
cd out
split -b $((W*H*8/3)) src.yuv
a=1; for n in x??; do mv $n `printf %08d $a`.v210; a=$(($a+1)); done
COUNT=$(ls *v210 | wc -l)

cat<<EOF > video.info
version 1
width $W
height $H
fourcc v210
fps 30
interlacing 0
count $COUNT
EOF

Then the out directory is playable with UG:

$ uv --playback out
$ # or
$ uv --playback out:loop

Note: You could also use file capture to play the file but you'll probably wont be able to get v210 out of it, because from FFmpeg perspective, v210 is rather a codec than a pixel format.

TheSashmo commented 2 years ago

Thank you I will try this out.