bytedeco / javacv

Java interface to OpenCV, FFmpeg, and more
7.51k stars 1.58k forks source link

Synchronization between multiple streamed audio RTP and low latency. #1091

Open teocci opened 5 years ago

teocci commented 5 years ago

Hello, everyone I'm using JavaCV for almost 5 months I was able to implement a basic application that receives multiple RTSP streams from Android phones as shown in this image. screenshot from 2018-11-20 18-06-49

My implementation creates a thread for each device and instantiate a FFmpegFrameGrabber object with the RTSP address and port to the device stream. The delay is noticeable and varies between 0.8 ~ 1.2 seconds approximately.

So I have two questions. It is possible to achieve real-time stream or reduce the delay to 0.2 seconds? I can achieve latency (between 0.2 to 0.3 seconds) when I run this command:

ffplay -fflags nobuffer -flags low_delay -framedrop \
 -strict experimental -rtsp_transport tcp rtsp://<host>:<port>

For example the AudioPlayer has this code:

public AudioPlayer(String source, GrabberListener grabberListener)
    if (grabberListener == null) return;
    if (source.isEmpty()) return;
    Thread playThread = new Thread(() -> {
        try {
            FFmpegFrameGrabber grabber = new FFmpegFrameGrabber(source);
            grabber.setOption("rtsp_transport", "tcp");

            playing = true;
            LogHelper.e(TAG, "audioPlaying(on)");

            if (grabber.getSampleRate() > 0 && grabber.getAudioChannels() > 0) {
                AudioFormat audioFormat = new AudioFormat(grabber.getSampleRate(), 16, grabber.getAudioChannels(), true, true);

                DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
                soundLine = (SourceDataLine) AudioSystem.getLine(info);

                gainControl = (FloatControl) soundLine.getControl(FloatControl.Type.MASTER_GAIN);

            ExecutorService executor = Executors.newSingleThreadExecutor();
            while (running.get()) {
                Frame frame = grabber.grab();
                if (frame == null) break;

                if (frame.samples != null) {
                    ShortBuffer channelSamplesFloatBuffer = (ShortBuffer) frame.samples[0];

                    ByteBuffer outBuffer = ByteBuffer.allocate(channelSamplesFloatBuffer.capacity() * 2);
                    float[] samples = new float[channelSamplesFloatBuffer.capacity()];

                    float lastPeak = 0f;

                    for (int i = 0; i < channelSamplesFloatBuffer.capacity(); i++) {
                        short val = channelSamplesFloatBuffer.get(i);

                    // Convert bytes to samples here
                    for (int i = 0, s = 0; i < channelSamplesFloatBuffer.capacity(); ) {
                        int sample = 0;

                        sample |= channelSamplesFloatBuffer.get(i++) & 0xFF; // (reverse these two lines
                        sample |= channelSamplesFloatBuffer.get(i++) << 8;   //  if the format is big endian)

                        // Normalize to range of +/-1.0f
                        samples[s++] = sample / 32768f;

                    float rms = 0f;
                    float peak = 0f;
                    for (float sample : samples) {
                        float abs = Math.abs(sample);
                        if (abs > peak) {
                            peak = abs;

                        rms += sample * sample;

                    rms = (float) Math.sqrt(rms / samples.length);

                    if (lastPeak > peak) {
                        peak = lastPeak * 0.875f;

                    lastPeak = peak;

                    grabberListener.onAudioSpectrum(rms, peak);

                    if (soundLine == null) return;
                    try {
                        executor.submit(() -> {
                            soundLine.write(outBuffer.array(), 0, outBuffer.capacity());
                    } catch (InterruptedException interruptedException) {

            executor.awaitTermination(1, SECONDS);

            playing = false;
            LogHelper.e(TAG, "audioPlaying(off)");

            if (soundLine != null) {

        } catch (Exception e) {
            LogHelper.e(TAG, e);
            grabberListener.onError(new RuntimeException("Could not open input stream.", e));

And my second question is It is possible to synchronize these RTP streams? probably using the NTP timestamp? However, the NTP timestamp is in the Sender Report RTCP Packet can I have access to that packet from the FFmpegFrameGrabber??

saudet commented 5 years ago

If we set the nobuffer and low_delay flags with JavaCV as well, it should behave the same, see issue #862. Something like FFmpegFrameGrabber.setOption("nobuffer", "1") or FFmpegFrameGrabber.setOption("fflags", "nobuffer") might work, but we might need to modify with something like proposed at If you give it a try and it works, please send a pull request! Thanks

About accessing the NTP timestamps, that sounds like a question about FFmpeg. You'll probably get more feedback by asking upstream about that. I wouldn't count on FFmpeg offering this kind of thing though, see issue #996.

saudet commented 5 years ago

Well, it won't be exactly the same, you have audio streams, but yes that's the general idea.

teocci commented 5 years ago

@saudet The implementation proposed at lu-zero/bmdtools#58 can be done starting on line 767 and adding this:

if (maxDelay = -1) {
    oc.max_delay(500); //for low latency
} else {
oc.flags(oc.flags() | AVFMT_FLAG_NOBUFFER);

Then in line 783 adding this:

for (int i = 0; i < nb_streams; i++) {
    AVStream st = oc.streams(i);
    // Get a pointer to the codec context for the video or audio stream
teocci commented 5 years ago

@saudet I think you have far more experience than do you think is possible to synchronize multiple FFmpegFrameGrabber instance of RTSP streams?

saudet commented 5 years ago

If we have access to accurate timestamps, sure, that sounds good.

teocci commented 5 years ago

@saudet Ok I'm trying to do that I mean when I call grabber.getTimestamp() I got the "position' of the grabbed frame.

So if we have three steams A, B, and C and A starts first then B after 60000 ms and C after 180000 ms. So if we attempt to synchronize A, B, and C 180500 ms after A started. But only information that we know is that the grabber A has started 180500 ms ago, grabber B started 120500 ms ago, and grabber C 500 ms ago. How can I compare this times to sync the streams? Do you have any idea please. T_T I'm trying to find a solution to this.

There is any way to get access to the start_time_realtime variable?

saudet commented 5 years ago

We'll probably need to enhance FFmpeg first. I believe there is a patch for this at, so we could try to apply it and see what that gives. Let me know if you have any questions and I will help. Thanks!

srajca commented 5 years ago

@teocci I am following your posts and am trying to achieve some similar results but on different area. Would you be willing to get in touch with me? I have some questions?

@saudet will this patch be available soon?

thank you all

saudet commented 5 years ago

@srajca The patch is already available for download. Feel free to try it anytime you want!