Error when muxing two separate video and audio files into one

niklaskorz commented 3 years ago

I am trying to mux a video track from a video file and an audio track from an audio file into a combined video file. Both input files are MP4 and the output file is MP4 as well.

The following error occurs no matter which video and audio files I use:

[mp4 @ 0000015c865a31c0] Application provided invalid, non monotonically increasing dts to muxer in stream 1: 287744 >= 287744

Code (open_input and open_output are taken from examples/muxing.rs):

pub fn mux_files(video_path: &str, audio_path: &str, out_path: &str) -> Result<()> {
    let mut video_demuxer = open_input(video_path)?;
    let (video_pos, video_codec) = video_demuxer
        .streams()
        .iter()
        .enumerate()
        .find_map(|(pos, stream)| {
            let params = stream.codec_parameters();
            if params.is_video_codec() {
                return Some((pos, params));
            }
            None
        })
        .context("video codec not found")?;
    let mut audio_demuxer = open_input(audio_path)?;
    let (audio_pos, audio_codec) = audio_demuxer
        .streams()
        .iter()
        .enumerate()
        .find_map(|(pos, stream)| {
            let params = stream.codec_parameters();
            if params.is_audio_codec() {
                return Some((pos, params));
            }
            None
        })
        .context("audio codec not found")?;

    let mut muxer = open_output(out_path, &[video_codec, audio_codec])?;

    while let Some(packet) = video_demuxer.take()? {
        if packet.stream_index() == video_pos {
            muxer.push(packet.with_stream_index(0))?;
        }
    }

    while let Some(packet) = audio_demuxer.take()? {
        if packet.stream_index() == audio_pos {
            muxer.push(packet.with_stream_index(1))?;
        }
    }

    muxer.flush()?;

    Ok(())
}

niklaskorz commented 3 years ago

The following approach works but I'm not sure if that is the best way:

    let mut last_dts = i64::MIN;
    let mut last_pts = i64::MIN;
    while let Some(packet) = video_demuxer.take()? {
        if packet.stream_index() == video_pos {
            let mut pts = packet.pts().timestamp();
            if pts <= last_pts {
                pts = last_pts + 1
            }
            last_pts = pts;
            let mut dts = packet.dts().timestamp();
            if dts <= last_dts {
                dts = last_dts + 1
            }
            last_dts = dts;
            let time_base = packet.time_base();
            let packet = packet
                .with_pts(Timestamp::new(pts, time_base))
                .with_dts(Timestamp::new(dts, time_base))
                .with_stream_index(0);
            muxer.push(packet)?;
        }
    }

    let mut last_dts = i64::MIN;
    let mut last_pts = i64::MIN;
    while let Some(packet) = audio_demuxer.take()? {
        if packet.stream_index() == audio_pos {
            let mut pts = packet.pts().timestamp();
            if pts <= last_pts {
                pts = last_pts + 1
            }
            last_pts = pts;
            let mut dts = packet.dts().timestamp();
            if dts <= last_dts {
                dts = last_dts + 1
            }
            last_dts = dts;
            let time_base = packet.time_base();
            let packet = packet
                .with_pts(Timestamp::new(pts, time_base))
                .with_dts(Timestamp::new(dts, time_base))
                .with_stream_index(1);
            muxer.push(packet)?;
        }
    }

Edit: it "works" in that it doesn't produce any errors, but the resulting video is a stuttering mess

operutka commented 3 years ago

Hi @niklaskorz!

Sorry for the delay. The first version of your code is a bit more correct than the second one. The second one messes up the timestamps big time. That's why the result is so messy.

It is important to know that if you want to put multiple elementary streams into a single container, you need to interleave all frames/packets. So for example if you have the following dts sequences:

# Stream #0
25 50 75 100

# Stream #1
10 20 30 40 50 60 70 80 90

You need to mux them in this order:

Stream:    1  1  0  1  1  0  1  1  1  0  1  1   0
DTS:      10 20 25 30 40 50 50 60 70 75 80 90 100

You can set up the muxer to do the interleaving for you but it has a big disadvantage - the muxer will buffer packets as necessary to correct the difference in packet interleaving. This can lead to big memory consumption. In your case, the muxer would have to buffer all video frames because you feed the muxer with all video frames first and then with all audio frames.

Now the error message you mentioned is related to a different problem. Most of the muxers have this constraint that packets in elementary streams must be muxed in increasing DTS order. So if we take stream #1 from the previous example, this is a correct muxing order:

10 20 30 40 ...

this is an incorrect muxing order:

20 10 30 40 ...

and this is also an incorrect muxing order:

10 10 20 30 40 ...

I can only speculate about the reason why you see this error message. It's hard to say anything else without further analysis of your input streams. But it really isn't that important. These errors simply happen. It may be caused by incorrectly muxed input stream. It may be also caused by the ADTS to ASC bitstream filter that is being used automatically by FFmpeg for AAC audio in MP4 (and all other MOV formats). There isn't a single correct way to remedy this issue. It always depends on the exact cause of the problem. You have several options:

You can drop all packets that would break the increasing DTS sequence. If all consecutive packets with the same DTS are duplicates, then it's OK to drop them. In other cases you might be creating gaps in your stream by dropping packets.
You can join all consecutive packets with the same DTS and stream index by appending their data. This may work in cases where the input packets were split incorrectly for some reason (e.g. slices of a single h264 frame in multiple packets). But be aware that it will probably not work for audio because audio codecs usually require constant number of audio samples per packet.
You can try to correct DTS of all packets breaking the increasing sequence. You can try using DTS of the last packet with the same stream index incremented by one. You can also wait for the next packet with the same stream index and set the "outlier" to be exactly in the middle between the last packet and the next packet.

The first point (i.e. duplicate packets) is usually quite unlikely. I'd try either the second or the third option.

operutka commented 3 years ago

And, of course, you can take inspiration from the FFmpeg app itself but it's usually quite hard to find what you're looking for. The app is quite complex.

emarsden commented 2 years ago

I ran into this same problem, and have implemented some of the workarounds for invalid dts/pts timestamps that are implemented in ffmpeg.c. This is probably imperfect, but works at least on some of the media streams where this problem is present (eg. Vevo MPEG DASH streams). See

https://github.com/emarsden/dash-mpd-rs/blob/main/src/libav.rs

operutka commented 1 year ago

I'm closing the issue due to inactivity. Feel free to reopen it if you need to.

angelcam / rust-ac-ffmpeg

Error when muxing two separate video and audio files into one #25