fraunhoferhhi / vvenc

VVenC, the Fraunhofer Versatile Video Encoder
https://www.hhi.fraunhofer.de/en/departments/vca/technologies-and-solutions/h266-vvc.html
BSD 3-Clause Clear License
942 stars 170 forks source link

Massive lag when playing video encoded by vvenc 1.11 #360

Closed birdie-github closed 7 months ago

birdie-github commented 7 months ago

A six seconds stall with 100% CPU use across all cores (8 physical, 16 SMT, Ryzen 5800X) can be observed when trying to play a clip encoded by vvenc 1.11 this way:

ffmpeg -i *webm -r 60 -f yuv4mpegpipe -hide_banner -loglevel error - \
| vvencapp --y4m -i - --preset slow -q 29 -o y4m.266

BTW, vvenc did not accept the source video framerate, so I had to use -r 60 option for ffmpeg. Maybe you could look into that as well.

I'm using mpv 0.35.1 + ffmpeg 6.0.1 + vvdec 2.2.0. Could be an issue with ffmpeg + vvdec integration, of course.

There's no "still image" for the first 6 seconds. It's static for two seconds maybe.

All the pertinent files are available here: https://mega.nz/folder/rh9GQK7I#d1-4AZ4Wx59FahbgHieibw

jungleboynx commented 7 months ago

When playing the webm/vp9 file with mpv the video starts at 6.838 sec. MediaInfo shows the frame rate as 59.94fps. The problem is probably caused by the audio offset of -6.838 sec MediaInfo shows "Delay relative to video : -6 s 838 ms". So the player freezes the video for the first 6.838 sec until it comes into sync with the audio. It's best to get the audio offset delay down to zero if possible which means editing the audio in Audacity and muxing this edited audio with the final encoded file. Also make sure the audio runtime matches the video runtime which means possibly having to add blank audio to the end to bring it upto the same runtime as the video runtime otherwise audio sync will be gradually lost as you play the file.

jungleboynx commented 7 months ago

I prefer to use AviSynth so I created an AVS file to decode your source file. To start with I re-muxed the webm into mkv using a frame rate of "60000/1001p".

SetFilterMTMode("DEFAULT_MT_MODE", MT_MULTI_INSTANCE) LoadPlugin("C:\Program Files\MeGUI-64\tools\lsmash\LSMASHSource.dll") LWLibavVideoSource("D:\DVD[B] PGL CS2 EU 1 RMR 2024 - Day 1 [i8fYyImYfjc].mkv") AssumeFPS(60000,1001) Spline64Resize(1920,1080) Trim(0,6661) Prefetch(6)

When importing the AVS file into avsPmod, AviSynth complained about the frame rate - it was 59.94fps - it should be 60/1.001 so I added AssumeFPS(60000,1001) to stop AviSynth complaining. avsPmod shows movement from frame 0 - there's no delay.

I then used a batch file using the piping tool avs2yuv to pipe avs to y4m on the fly to feed into vvenc set P1=-q %1 --preset fast -c yuv420 -rs 1 -t 12 --sdr sdr_709 --y4m avs2yuv -depth 8 ..\video1.avs - | vvenc %P1% -i - -o ..\video1.vvc

I also need to strip out the Opus audio from the Webm or mkv file and edit it with Audacity to bring it into sync and then finally mux the video and audio together using mp4box.

ffmpeg is ok if the video and audio streams are nice and clean. If you need to do strange things like cropping or inverse telecine or change the gamma (tone map) or upscale then AviSynth or VapourSynth is the thing to use.

jungleboynx commented 7 months ago

Here's the 1080p/h266/mp4 file (53MB) https://www.dropbox.com/scl/fo/j6r4po3naxgsa5z6vk961/h?rlkey=1a0y3i488qdqt226zcssalxfb&dl=0 I removed 6.838 seconds of audio from the start, applied a gain of 3dB and cut the audio runtime to be 18ms below the video runtime and save as 24bit FLAC. Then encode the FLAC to AAC-LC using the Apple AAC encoder VBR q91 mode which gives 136Kbps average and 194Kbps peak. Opus could be used instead.

K-os commented 7 months ago

Thank you very much for the analysis and the help, @jungleboynx .

If you don't want to go the AviSynth route, you can also use ffmpeg directly to trim the offset (piping it to ffplay for testing): ffmpeg -i *webm -an -ss 6.838 -f yuv4mpegpipe -hide_banner -loglevel error - |ffplay -

I don't know if it is intentional that ffmpeg adds this delay, even if you explicitly drop the audio track (-an). Feels like a bug to me. You might want to open a bug report there.

Concerning your problem with the frame rate @lehmann-c gave you an answer in the VVdeC issue 343.

I guess the issue is resolved now?

birdie-github commented 7 months ago

I don't know what everyone is talking about.

There's no audio track for either the raw .266 file (it can't have it even theoretically), or the muxed mp4 file:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '29.mp4':
  Metadata:
    major_brand     : iso4
    minor_version   : 1
    compatible_brands: iso4iso6
    creation_time   : 2024-02-24T18:10:25.000000Z
    encoder         : GPAC-2.3-DEV-revrelease
  Duration: 00:01:57.98, start: 0.000000, bitrate: 6854 kb/s
  Stream #0:0[0x1](und): Video: vvc (Main 10) (vvc1 / 0x31637676), yuv420p10le(tv), 3840x2160, 6848 kb/s, 60 fps, 60 tbr, 60 tbn (default)
    Metadata:
      creation_time   : 2024-02-24T18:10:25.000000Z
      handler_name    : 266@GPAC2.3-DEV-revrelease
      vendor_id       : [0][0][0][0]
K-os commented 7 months ago

Your input, the webm file has an audio track

birdie-github commented 7 months ago

When playing the webm/vp9 file with mpv the video starts at 6.838 sec.

The WEBM file starts playing for me immediately. There's no audio/video desync of any kind.

I've tried mpv, vlc, mpc and ffplay. Not a single player exhibits any issues.

MediaInfo is MediaInfo. It's neither a full featured decoding library, not a player. It simply has a parser for audio/video files which is far from perfect.

And lastly my encoding command omits the audio track completely.

K-os commented 7 months ago

Yes, but ffmpeg has issues with it, when extracting the video.

Just test it like this:

ffmpeg -i *webm -an -f yuv4mpegpipe  - |ffplay -

That's why I suggested you open a bug report there.

birdie-github commented 7 months ago

Yes, but ffmpeg has issues with it, when extracting the video.

Confirmed. Could be an ffmpeg bug indeed. Sorry for the noise. I didn't realize it earlier.

jungleboynx commented 7 months ago

No, it's not a bug in ffmpeg - it's just how the WEBM file was created with a very big audio offset.

It's pretty common for broadcast files to have an audio offset. For HD broadcast it's usually between +/-2 sec max. For 4K broadcast it can be a bit higher. Bluray usually has an audio offset of 0 sec. SD broadcast, DVD and Web files sometimes have a small offset like +/-40ms. Also the act of editing the video can introduce a small audio offset as the position of the first audio packet w.r.t the video packet is often not taken into account.

It's likely that the WEBM was created by some kind of HDMI capture device in real-time. HDMI capture re-encodes the video on the fly - in this case VP9. The PCM audio has been encoded to OPUS but it doesn't need to be synced to the video as long as an offset is added to the file. Although the WEBM displays video immediately with mpv it starts from a time of 6.838 sec (not 0) to get into sync with the audio. When the video was encoded to 266, ffmpeg used the audio offset to offset the video by +6.838 sec with the assumption that the original audio would be muxed with it.

What I did was delete the first 6.838 sec of audio using Audacity which brings it into sync with the video. The same problem would occur if any video codec was used.

For testing audio sync it's best to use some video with someone talking. If the audio is early or late you'll soon see the sync error as lip movement will be out of sync with the audio. This only works it the sync error is larger than +/-50ms - human ears/eyes/brains aren't that good. The BBC had a test video with a progress bar which moves left to right with a click sound that occurs at the centre - it might be available on Youtube which could be downloaded with yt-dlp if it exists. This would allow you to measure the time the click occurs and compare this to the time of the progress bar at the point of the click.

The AAC encoder has a prime time delay at the start which can cause sync problems. The Apple AAC encoder has an option of removing this delay but only for AAC-LC mode. For HE-AAC mode the prime time delay still exists. mp4box has another bug where it assumes all AAC audio has no prime time delay. If HE-AAC (with prime time delay) is used with mp4box then the audio will be out of sync by +88ms. If AAC-LC is used (with mp4box) then the audio will be in-sync. This problem doesn't exist in mkvtoolnix but this tool currently doesn't support the 266/VVC codec.

Most lossy audio codecs have a prime time delay - AC3 is 5ms. I don't know anything about OPUS but I wouldn't be surprised if it also has a small time delay. One solution is to convert the audio into a lossless intermediate format like 24bit FLAC (with sync errors corrected) and then encode the FLAC to the final codec. WAV can also be used as an intermediate codec but it has a 4GB filesize limit unless the "--ignore-length" parameter is used by the target encoder. PCM could be used but it doesn't have metadata and the byte order is big endian. After performing any sync correction in Audacity just save the audio as 24bit FLAC using compression mode 0 (fast) and then encode the FLAC to the final codec.

Watch out for codecs with remainder time like AAC. Remainder time for AAC can be upto 1 packet for AAC-LC (21.333'ms) or 2 packets for HE-AAC (85.333'ms). I normally cut the FLAC runtime to be either 18ms or 64ms shorter than the video runtime depending on whether AAC-LC or HE-AAC is used but it's still possible for the audio runtime to finish up slightly higher than the video runtime. Worst case audio runtime is: AAC-LC (-18+21=+3ms) or HE-AAC (-64+85=+21ms). Normally the audio runtime is just below the video runtime.