bytedeco / javacv

Java interface to OpenCV, FFmpeg, and more
Other
7.49k stars 1.57k forks source link

Audio and video are out of sync during continuous streaming #1474

Open kzxuan opened 4 years ago

kzxuan commented 4 years ago

When javacv FFmpegFrameGrabber is used for continuous streaming, and the streaming source is switched in real time among multiple short videos (less than 10 seconds), audio and video will be out of sync after a period of time (possibly 40 seconds). How do I set timestamps to synchronize them?

saudet commented 4 years ago

Have you tried setTimestamp()? That will set it for the video frame. Audio frames can't typically be skipped, so it's often not possible to set them, but we can usually set the timestamp of video frames to match the audio frames.

kzxuan commented 4 years ago

@saudet Thank you for your advice. But I need more information about how to use this function. I have tried recorder.setTimestamp(System.currentTimeMillis()) or recorder.setTimestamp(System.currentTimeMillis() - startTime) or recorder.setTimestamp(0) before grabbing frames from each source video, but errors like "Application provided invalid, non monotonically increasing dts to muxer in stream 0" happened.

saudet commented 4 years ago

Use the timestamps of the audio frames that you received from FFmpegFrameGrabber.grab().

kzxuan commented 4 years ago

@saudet Thanks ever so. For the grabber of each source video, I have tried the following code:

while ((grabFrame = grabber.grab()) != null) {
    recorder.setTimestamp(totalTimestamp + grabFrame.timestamp);
    recorder.record(grabFrame);
}
totalTimestamp += grabber.getTimestamp();

The param 'totalTimestamp' is used for adding up the timestamps of all the source videos have been recorded.

But this doesn't work.

kzxuan commented 4 years ago

Code like this can alleviate the problem, but it can't solve the problem.

long maxTimestamp = -1;

while ((grabFrame = grabber.grab()) != null) {
    maxTimestamp = Math.max(grabFrame.timestamp, maxTimestamp);
    grabFrame.timestamp += totalTimestamp;
    recorder.setTimestamp(grabFrame.timestamp);
    recorder.record(grabFrame);
}
totalTimestamp += maxTimestamp + 10000;
anotherche commented 3 years ago

I guess that this can be because of little difference in the duration of video and audio streams in every source. Say, if audio is always shorter than video, then such a concatenation will result in audio becoming more and more ahead of video.

anotherche commented 3 years ago

@kzxuan, if the unsync is because of the difference in streams' duration, you can change your code to

long lastAudio_ts=Long.Min;
long lastVideo_ts=Long.Min;
while ((grabFrame = grabber.grab()) != null) {
    if(grabFrame.image!=null) lastVideo_ts=grabFrame.timestamp;
    if(grabFrame.samples!=null) lastAudio_ts=grabFrame.timestamp;
    recorder.setTimestamp(totalTimestamp + grabFrame.timestamp);
    recorder.record(grabFrame);
}
totalTimestamp += max(lastVideo_ts, lastAudio_ts);
anotherche commented 3 years ago

Oh, I'm sorry, I already realized that this does not solve the problem at all. But what if we increase totalTimestamp by lastAudio_ts? Perhaps the missing video will simply be skipped if the video stream is shorter than the audio. But what happens if the video stream is longer? Then there will be an overlap of video frames with intersecting timestamps.

long lastAudio_ts=Long.Min;
while ((grabFrame = grabber.grab()) != null) {
    if(grabFrame.samples!=null) lastAudio_ts=grabFrame.timestamp;
    recorder.setTimestamp(totalTimestamp + grabFrame.timestamp);
    recorder.record(grabFrame);
}
totalTimestamp += lastAudio_ts;
anotherche commented 3 years ago

Unfortunately, this option won't work either, as overlapping timestamps are not allowed when encoding. I checked it out. But I have another idea that should definitely work. We somehow have to make sure that at the end of the recording of the next section (source) the duration of the audio and video does not differ by more than 1 frame, and that this difference does not accumulate from source to source. There are two options. The first, more difficult one, is to record the source to the end, and then add the missing video (repeat the last image) or audio (silence?) frames. The second option is much simpler, but it will lead to the loss of the last 1-2 video frames, which does not look so terrible (this will work if the difference in the length of the streams does not exceed several frames). To do this, we stop recording video frames in the cycle when we approach the end of the source, for about a couple of video frames (track the source video by the timestamp, compare it with the total video duration). At the same time, on every audio frame record we should check that the timestamp of audio frames does not differ from the received video length by more than the duration of one audio frame. Thus, we will be cut a couple of frames from the end of each source, and the difference in the length of the audio and video will not exceed the length of one audio frame (plus or minus). So that this difference does not add up to a large deviation anyway when combining many sources, it is necessary to track not just the difference within one source, but the difference in the resulting recording. In case the difference in the length of audio and video is greater than a couple of video frames within a source, you can increase the indent from the end, or use the first method, inserting audio frames with silence.

kzxuan commented 3 years ago

@anotherche Thank you very much for your attention. Indeed, the difference in the number of frames between audio and video will lead to the accumulation of differences in continuous push. My previous plan was to forcibly align the timestamp before each push, but it was unstable. I will try your second option.