bluenviron / mediamtx

Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.
MIT License
10.74k stars 1.4k forks source link

"DTS is not monotonically increasing" error with SRT playback and Intel QSV HEVC encoder #3285

Closed foraphe closed 1 month ago

foraphe commented 2 months ago

Which version are you using?

v1.4.2 The replication steps can also replicate the issue on v1.8.0, but I haven't done more tests on v1.8.0 yet.

Which operating system are you using?

Describe the issue

(I first thought the problem has something to do with #1002 but now find that the two issues are likely irrelevant.)

When streaming an Intel QSV HEVC encoded video to MediaMTX, SRT playback won't work at all, playback fails without any frame being displayed, with MediaMTX complaining "DTS is not monotonically increasing, was (some number)s, now is (some number)s" and the first number being 0.05 to 0.5 larger than the second one. It only happens with QSV HEVC. libx265, QSV AVC(H.264) and AMD's AMF HEVC all work properly without issues. I can confirm that packet DTS is indeed monotonically increasing in the video clip using ffprobe, and the same video clip plays fine locally as well as when streamed over some other SRT streaming servers. So it's unlikely a problem with the encoder itself.

Only SRT playback is affected: RTSP publish, SRT read -> error RTSP publish, RTSP read -> OK SRT publish, SRT read -> error SRT publish, RTSP read -> OK

I suspect it's something with B frames, but I have no idea how this could lead to a DTS-related error. Intel's QSV HEVC encoder on DG2 (Arc A380 in my case) has GPB (Generalized P and B pictures) enabled by default, so every non-intra frame looks like a B frame, at least in FFmpeg. And with FFmpeg setting the default GOP size to 248, video streams look like long series of B frames (a few hundreds of them) with single I frames in between. A workaround is to disable GPB (-gpb 0) and set GOP size to extremely small values(e.g. -g 5, 10 or more won't solve the problem) when encoding using FFmpeg, which introduces P frames back and cuts down the length of concurrent B frames to single digits, at the cost of making the quality terrible (almost 4x bitrate for the same quality).

Describe how to replicate the issue

  1. Start MediaMTX with default configuration.
  2. Publish to MediaMTX use SRT or RTSP with OBS Studio or FFmpeg using Intel QSV HEVC encoder at default settings. (e.g. ffmpeg -re -i /path/to/video.mkv -an -c:v hevc_qsv -f mpegts "srt://example.com:8890/?streamid=publish:stream").
  3. Try reading the stream using ffplay or VLC (e.g. In VLC: Media -> Open Network Stream -> enter srt://example.com:8890/?streamid=read:stream -> Play).

An example of not working sample (encoded using QSV) and a working sample (encoded using x265) could be found here, if these samples are used, -c:v hevc_qsv can be replaced with -c:v copy in the above command line example so that there's no need for an Intel GPU.

Did you attach the server logs?

Yes, but there's probably not much information. logLevel is set to debug

2024/04/23 15:23:06 INF [SRT] [conn [some-ip]:34237] is publishing to path 'srt', 1 track (H265)
2024/04/23 15:23:10 INF [SRT] [conn [some-ip]:42477] opened
2024/04/23 15:23:10 INF [SRT] [conn [some-ip]:42477] is reading from path 'srt', 1 track (H265)
2024/04/23 15:23:14 INF [SRT] [conn [some-ip]:42477] closed: DTS is not monotonically increasing, was 8.4084s, now is 8.3416
66666s

Did you attach a network dump?

No, it's unlikely to be a network-related problem, and happens on localhost as well.

foraphe commented 1 month ago

After some more digging, it seems that the problem is with SPS data reading instead of DTS extraction.

When publishing a stream encoded using QSV HEVC encoder, VUI is somehow not correctly read (d.spsp.VUI is nil), code execution goes into this branch in bluenviron/mediacommon/pkg/codecs/h265/dts_extractor.go, and PTS is returned.

I can confirm that VUI data is present on both x265 and QSV HEVC streams, with the only difference being video_format (5 vs 0). x265 stream works, QSV HEVC stream doesn't. x265: X265 qsv: QSV

GPB is probably related to the problem, but I'm not sure why it could lead to SPS not being correctly read.

aler9 commented 1 month ago

Hello, thanks for the detailed analysis, i can confirm that the SPS parser has some troubles reading the SPS of QSV, in particular there's a missing algorithm, that is documented in ITU-T H265:

When inter_ref_pic_set_prediction_flag is equal to 1, the variables DeltaPocS0[ stRpsIdx ][ i ], UsedByCurrPicS0[ stRpsIdx ][ i ], NumNegativePics[ stRpsIdx ], DeltaPocS1[ stRpsIdx ][ i ], UsedByCurrPicS1[ stRpsIdx ][ i ] and NumPositivePics[ stRpsIdx ] are derived as follows:
i = 0
for( j = NumPositivePics[ RefRpsIdx ] − 1; j >= 0; j− − ) {
dPoc = DeltaPocS1[ RefRpsIdx ][ j ] + deltaRps
102
 Rec. ITU-T H.265 v8 (08/2021)
if( dPoc < 0 && use_delta_flag[ NumNegativePics[ RefRpsIdx ] + j ] ) {
DeltaPocS0[ stRpsIdx ][ i ] = dPoc
UsedByCurrPicS0[ stRpsIdx ][ i++ ] =
used_by_curr_pic_flag[ NumNegativePics[ RefRpsIdx ] + j ]
}
}
if( deltaRps < 0 && use_delta_flag[ NumDeltaPocs[ RefRpsIdx ] ] ) {
 (7-61)
DeltaPocS0[ stRpsIdx ][ i ] = deltaRps
UsedByCurrPicS0[ stRpsIdx ][ i++ ] = used_by_curr_pic_flag[ NumDeltaPocs[ RefRpsIdx ] ]
}
for( j = 0; j < NumNegativePics[ RefRpsIdx ]; j++ ) {
dPoc = DeltaPocS0[ RefRpsIdx ][ j ] + deltaRps
if( dPoc < 0 && use_delta_flag[ j ] ) {
DeltaPocS0[ stRpsIdx ][ i ] = dPoc
UsedByCurrPicS0[ stRpsIdx ][ i++ ] = used_by_curr_pic_flag[ j ]
}
}
NumNegativePics[ stRpsIdx ] = i
i = 0
for( j = NumNegativePics[ RefRpsIdx ] − 1; j >= 0; j− − ) {
dPoc = DeltaPocS0[ RefRpsIdx ][ j ] + deltaRps
if( dPoc > 0 && use_delta_flag[ j ] ) {
DeltaPocS1[ stRpsIdx ][ i ] = dPoc
UsedByCurrPicS1[ stRpsIdx ][ i++ ] = used_by_curr_pic_flag[ j ]
}
}
if( deltaRps > 0 && use_delta_flag[ NumDeltaPocs[ RefRpsIdx ] ] ) {
 (7-62)
DeltaPocS1[ stRpsIdx ][ i ] = deltaRps
UsedByCurrPicS1[ stRpsIdx ][ i++ ] = used_by_curr_pic_flag[ NumDeltaPocs[ RefRpsIdx ] ]
}
for( j = 0; j < NumPositivePics[ RefRpsIdx ]; j++) {
dPoc = DeltaPocS1[ RefRpsIdx ][ j ] + deltaRps
if( dPoc > 0 && use_delta_flag[ NumNegativePics[ RefRpsIdx ] + j ] ) {
DeltaPocS1[ stRpsIdx ][ i ] = dPoc
UsedByCurrPicS1[ stRpsIdx ][ i++ ] =
used_by_curr_pic_flag[ NumNegativePics[ RefRpsIdx ] + j ]
}
}
NumPositivePics[ stRpsIdx ] = i

i'm working to implement it.

aler9 commented 1 month ago

The fix is ready (https://github.com/bluenviron/mediacommon/pull/128), please test this nightly release and let me know if the error is gone (click on "Artifacts", "Binaries"):

https://github.com/bluenviron/mediamtx/actions/runs/9132458367

foraphe commented 1 month ago

The error is gone using the nightly release. Thanks very much for the fix.