bytedeco / javacv

Java interface to OpenCV, FFmpeg, and more
Other
7.56k stars 1.58k forks source link

Failed to mix 2 pcm mono audio files using FFmpegFrameFilter class and "amix" filter option #1315

Open debjdutta opened 5 years ago

debjdutta commented 5 years ago

Hello, I am not able to mix two mono pcm ulaw files using FFmpegFrameFilter class with the "amix" option. The output file size is much smaller (78 bytes) as comapred to input (225 KB & 225 KB) and the does not play at all.

Intially I created a mix audio running ffmpeg commandline so that I can compare the results with the output from the java program

ffmpeg -i xmit.wav -i recv.wav -filter_complex [0:a][1:a]amix=inputs=2 ffmpeg-amix-audio.wav

Commad

ffmpeg -i ffmpeg-amix-audio.wav

Console output

Output #0, wav, to 'ffmpeg-amix-audio.wav':
  Metadata:
    ISFT            : Lavf58.12.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, mono, s16, 128 kb/s (default)
    Metadata:
      encoder         : Lavc58.18.100 pcm_s16le
size=     451kB time=00:00:28.88 bitrate= 128.0kbits/s speed= 310x
video:0kB audio:451kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.016880%

My sample java program to mix two audio files using javacv-1.5.1.jar

public static void amixTest() throws Exception, org.bytedeco.javacv.FrameFilter.Exception, org.bytedeco.javacv.FrameRecorder.Exception {
        String xmit = "C:/media/audio/xmit.wav"; //mono, 8KHz, 64 kb/s PCM ulaw
        String recv = "C:/media/audio/recv.wav"; //mono, 8KHz, 64 kb/s PCM ulaw
        String output = "C:/media/audio/javacv-amix-audio.wav";

        // Grab input samples
        FFmpegFrameGrabber gbXmit = new FFmpegFrameGrabber(xmit);
        FFmpegFrameGrabber gbRecv = new FFmpegFrameGrabber(recv);
        gbXmit.setAudioChannels(1);
        gbXmit.start();
        gbRecv.setAudioChannels(1);
        gbRecv.start();

        // Start the recorder
        FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(output,1); // output is a mono audio channel
        recorder.setFormat("wav");
        recorder.start();

        // Filter the Input
        // Note: the filter string from ffmpeg official site did not work (https://ffmpeg.org/ffmpeg-filters.html#amix)
        // e.g. amix=inputs=2:duration=first
        // Refer to discussion https://github.com/bytedeco/javacv/issues/1082
        // New filter string "[0:a][1:a]amix=inputs=2[a]"
        FFmpegFrameFilter filter = new FFmpegFrameFilter("[0:a][1:a]amix=inputs=2[a]", 1);
        filter.setAudioInputs(2);
        filter.start();

        Frame outFrame = null; Frame xmitFrame = null; Frame recvFrame = null;
        long xmitFrCt = 0; long recvFrCt = 0; long recFrCt = 0;
        boolean readAllXmitFrames= false; boolean readAllRecvFrames = false;

        // process frames
        while(true) {

            xmitFrame = gbXmit.grabSamples();
            recvFrame = gbRecv.grabSamples();
            if(xmitFrame!=null) {
                xmitFrCt++;
                System.out.println("Xmit frame timestamp: " + xmitFrame.timestamp);
                filter.push(0, xmitFrame);
            } else {
                readAllXmitFrames = true;
            }
            if(recvFrame!=null) {
                recvFrCt++;
                System.out.println("Recv frame timestamp: " + recvFrame.timestamp);
                filter.push(1, recvFrame);
            } else {
                readAllRecvFrames = true;
            }

            while ((outFrame = filter.pullSamples()) != null) {
                recFrCt++;
                System.out.println("Output frame timestamp: " + outFrame.timestamp);
                recorder.record(outFrame);
            }

            if(readAllXmitFrames && readAllRecvFrames) {
                System.out.println("Xmit Frame Count: " + xmitFrCt + ", Recv Frame Count: " + recvFrCt + ", Recording Frame count: " + recFrCt);
                break;
            }
        }

Eclipse console output does not give any error, however the program failed to read the last sample from each input and the output frame/sample timestamps are different than input. Also the output stream info does not match with ffmpeg output

Xmit frame timestamp: 0
Recv frame timestamp: 0
Output frame timestamp: 0
Xmit frame timestamp: 512000
Recv frame timestamp: 512000
…..
….
Xmit frame timestamp: 28160000
Recv frame timestamp: 28160000
Output frame timestamp: 5108390
Xmit frame timestamp: 28672000
Recv frame timestamp: 28672000
Xmit Frame Count: 57, Recv Frame Count: 57, Recording Frame count: 56
Input #0, wav, from 'C:/media/audio/xmit.wav':
  Duration: 00:00:28.88, bitrate: 64 kb/s
    Stream #0:0: Audio: pcm_mulaw ([7][0][0][0] / 0x0007), 8000 Hz, 1 channels, s16, 64 kb/s
Input #0, wav, from 'C:/media/audio/recv.wav':
  Duration: 00:00:28.68, bitrate: 64 kb/s
    Stream #0:0: Audio: pcm_mulaw ([7][0][0][0] / 0x0007), 8000 Hz, 1 channels, s16, 64 kb/s
Output #0, wav, to 'C:/media/audio/javacv-amix-audio.wav':
  Metadata:
    ISFT            : Lavf58.20.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s

Here is the ffmpeg output from directly quering the javacv generated file Command ffmpeg -i javacv-amix-audio.wav Console Output

Input #0, wav, from 'javacv-amix-audio.wav':
  Metadata:
    encoder         : Lavf58.20.100
  Duration: 00:00:05.20, bitrate: 705 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s

Please suggets how to fix the java program.

saudet commented 5 years ago

You're not calling recorder.stop()...

debjdutta commented 5 years ago

Sorry, I forgot to copy the close()/stop()/release() statements for the frame recorder, filter & grabbers. Below is the code snippet from my editor.

    public static void amixTest() throws Exception, org.bytedeco.javacv.FrameFilter.Exception, org.bytedeco.javacv.FrameRecorder.Exception {
        String xmit = "C:/media/audio/xmit.wav"; //mono, 8KHz, 64 kb/s PCM ulaw
        String recv = "C:/media/audio/recv.wav"; //mono, 8KHz, 64 kb/s PCM ulaw
        String output = "C:/media/audio/javacv-amix-audio.wav";

        // Grab input samples
        FFmpegFrameGrabber gbXmit = new FFmpegFrameGrabber(xmit);
        FFmpegFrameGrabber gbRecv = new FFmpegFrameGrabber(recv);
        gbXmit.setAudioChannels(1);
        gbXmit.start();
        gbRecv.setAudioChannels(1);
        gbRecv.start();

        // Start the recorder
        FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(output,1); // output is a mono audio channel
        recorder.setFormat("wav");
        recorder.start();

        // Filter the Input
        // Note: the filter string from ffmpeg official site did not work (https://ffmpeg.org/ffmpeg-filters.html#amix)
        // e.g. amix=inputs=2:duration=first
        // Refer to discussion https://github.com/bytedeco/javacv/issues/1082
        // New filter string "[0:a][1:a]amix=inputs=2[a]"
        FFmpegFrameFilter filter = new FFmpegFrameFilter("[0:a][1:a]amix=inputs=2[a]", 1);
        filter.setAudioInputs(2);
        filter.start();

        Frame outFrame = null; Frame xmitFrame = null; Frame recvFrame = null;
        long xmitFrCt = 0; long recvFrCt = 0; long recFrCt = 0;
        boolean readAllXmitFrames= false; boolean readAllRecvFrames = false;

        // process frames
        while(true) {

            xmitFrame = gbXmit.grabSamples();
            recvFrame = gbRecv.grabSamples();
            if(xmitFrame!=null) {
                xmitFrCt++;
                System.out.println("Xmit frame timestamp: " + xmitFrame.timestamp);
                filter.push(0, xmitFrame);
            } else {
                readAllXmitFrames = true;
            }
            if(recvFrame!=null) {
                recvFrCt++;
                System.out.println("Recv frame timestamp: " + recvFrame.timestamp);
                filter.push(1, recvFrame);
            } else {
                readAllRecvFrames = true;
            }

            while ((outFrame = filter.pullSamples()) != null) {
                recFrCt++;
                System.out.println("Output frame timestamp: " + outFrame.timestamp);
                recorder.record(outFrame);
            }

            if(readAllXmitFrames && readAllRecvFrames) {
                System.out.println("Xmit Frame Count: " + xmitFrCt + ", Recv Frame Count: " + recvFrCt + ", Recording Frame count: " + recFrCt);
                break;
            }
        }

        // recorder stop
        recorder.stop(); recorder.close(); recorder.release();

        // filter stop
        filter.stop(); filter.close(); filter.release();

        // grabber stop 
        gbXmit.stop(); gbXmit.close(); gbXmit.release();
        gbRecv.stop(); gbRecv.close(); gbRecv.release();

    }
saudet commented 5 years ago

Ok, next thing, if you'd like the output to have the same sample rate as the inputs, you'll need to set it with setSampleRate() somewhere before start().

debjdutta commented 5 years ago

Thank you for the tips. I had to set the sample rate to 8000, both in the filter and the recorder. Now I get an o/p which is almost similar to ffmpeg except that the o/p file size is smaller (hence the duration). FFmpeg generates a 451KB file but the java program creates a 448KB file. It still fails to record the last sample. Please suggest how to improve the java program.

saudet commented 5 years ago

Your inputs are not of the same length. If you need to have everything, you'll need to lengthen the shortest one of them.

debjdutta commented 5 years ago

How can I do that without changing the original audio file. In ffmpeg the amix filter has a duration parameter, but I could not find any example with javacv FrameFilters. Also while running my program I keep separate counters for the no of samples grabbed from the xmit.wav, recv.wav files & no of samples polled from the filter. Even though the files are not exactly of same length, the grabber for each file give me 57 samples, however the no of polled samples from the filter is always 56. I could not understand why the last input samples were discarded.

saudet commented 5 years ago

The number of samples for each is different, you'll need to give it more samples. The ffmpeg program probably just extends them with 0s, so try to do that.

debjdutta commented 5 years ago

I have verified with the ffmpeg filter -filter_complex [0:a][1:a]amix=inputs=2:duration=shortest that when the duration of the mixed audio is set to the shortest input, the output file size is same as the javacv output. FFmpeg by default set the duration to longest input and the I guess JavaCV sets to shortest input.

saudet commented 5 years ago

JavaCV doesn't do anything. That's how the amix filter works by default.

saudet commented 5 years ago

Ah, no, amix should pick the longest stream by default:

The duration of the longest input. (default)

https://ffmpeg.org/ffmpeg-filters.html#amix

So your code isn't sending all existing samples to amix...

debjdutta commented 5 years ago

I will check the code again. FYI, ffmpeg in debug mode shows 57 packets being read from each input which is exactly same no of frames/samples that my program grabs from each input using FFmpegFrameGrabber. All these frames are fed to the filter. FFmpeg console output

Input file #0 (xmit.wav):
  Input stream #0:0 (audio): 57 packets read (231040 bytes); 57 frames decoded (231040 samples);
  Total: 57 packets (231040 bytes) demuxed
Input file #1 (recv.wav):
  Input stream #1:0 (audio): 57 packets read (229440 bytes); 57 frames decoded (229440 samples);
  Total: 57 packets (229440 bytes) demuxed
Output file #0 (ffmpeg-amix-audio.wav):
  Output stream #0:0 (audio): 58 frames encoded (231040 samples); 58 packets muxed (462080 bytes);
  Total: 58 packets (462080 bytes) muxed

Also I noticed that ffmpeg auto-inserts an extra filter (named auto_resampler_2) that is missing from the DEBUG logs of my program. FFmpeg console output

[Parsed_amix_0 @ 000001be1916d440] auto-inserting filter 'auto_resampler_0' between the filter 'graph_0_in_0_0' and the filter 'Parsed_amix_0'
[Parsed_amix_0 @ 000001be1916d440] auto-inserting filter 'auto_resampler_1' between the filter 'graph_0_in_1_0' and the filter 'Parsed_amix_0'
[format_out_0_0 @ 000001be1914cd40] auto-inserting filter 'auto_resampler_2' between the filter 'Parsed_amix_0' and the filter 'format_out_0_0'

Java console output (using JavaCV in DEBUG mode)

[Parsed_amix_0 @ 000000001f938900] auto-inserting filter 'auto_resampler_0' between the filter 'asetpts0' and the filter 'Parsed_amix_0'
[Parsed_amix_0 @ 000000001f938900] auto-inserting filter 'auto_resampler_1' between the filter 'asetpts1' and the filter 'Parsed_amix_0'

Do you think this could be the issues?

saudet commented 5 years ago

The number of samples is not the same, that's not necessarily related to the number of frames. Each audio frame may contain a different number of samples.

debjdutta commented 5 years ago

Could pleases suggest how can I get the raw data from each frame (org.bytedeco.javacv.Frame)?

saudet commented 5 years ago

That's in the Frame.samples

debjdutta commented 5 years ago

Based on your previous input I found that during the last iteration, the grabbed input sample sizes are different for the xmit (1664 bytes) and recv (b4 bytes) audio (the recv.wav file is 200milliseconds shorter). For these samples when the public Frame pullSamples()method in FFmpegFrameFilter.java is called, the av_buffersink_get_frame(abuffersink_ctx, filt_frame) method return -11 (AVERROR_EAGAIN()) which causes pullSamples() method to return null and loss of the last frame from each input. I guess this an expected behavior? Hoewever after padding the smaller frame I am getting the same output size as ffmpeg. Thank much for pointing me in the right direction.

saudet commented 5 years ago

EAGAIN means that we must provide more input to the filter, or we won't be getting more output: https://www.ffmpeg.org/doxygen/trunk/group__lavfi__buffersink.html#ga653228f4cbca427c654d844a5fc59cfa There's not much we can do other than add more input there. I'm not sure how the ffmpeg program handles this, but the source code is there and we could add similar logic to FFmpegFrameGrabber.

saudet commented 5 years ago

Actually, I think I know what we can do about this. I've just noticed there's a missing call to av_buffersrc_add_frame_flags() with a null frame that should indicate the end of the stream to the filter so I've added that to commit 5868333caf49e9aabed6d0a60993f8052ff1bc4a. With this change, try to call push(0, null) and push(1, null) when you're out of frames, and that should make it work.

debjdutta commented 5 years ago

Sorry for late response. I have tested the changes with the following logic for reading the audio samples, but the output file was shorter as before (without padding). Using the debugger I made sure that av_buffersrc_add_frame_flags(abuffersrc_ctx[n], null, 0);is getting called. Also noticed a direct the call to push(0, null) or push(1, null) throws null pointer exception so had to include AV_PIX_FMT_NONE parameter.

            if(xmitFrame!=null) {
                xmitFrCt++;
                filter.push(0, xmitFrame);
            } else {
                // reached EOF
                filter.push(0, null, AV_PIX_FMT_NONE);
                readAllXmitFrames = true;
            }
            if(recvFrame!=null) {
                recvFrCt++;
                filter.push(1, recvFrame);              
            } else {
                // reached EOF
                filter.push(1, null, AV_PIX_FMT_NONE);
                readAllRecvFrames = true;
            }
saudet commented 5 years ago

Ah, I see, there is another issue there, so you're doing it right I think. There must be some other way the ffmpeg program tells the filter about the end of the stream...

saudet commented 5 years ago

Looking briefly at ffmpeg.c it looks like we should be using the AV_BUFFERSRC_FLAG_PUSH flag so I've added that to the last commit.

debjdutta commented 5 years ago

Today, I have tested the previous audio amix filter example with you recent changes in FFmpegFrameFilter.java, however there is no difference in the output, the file is still shorter than what I get by running ffmpeg command. After going through the source code of ffmpeg.c, I see the below code where for each filtergraph, if the audio encoder supports receiving samples of different sizes (CODEC_CAP_VARIABLE_FRAME_SIZE) , the framesize of an audio buffer sink is set to a fixed size (output codec frame size) by calling av_buffersink_set_frame_size() method. I do not see the FFmpegFrameFilterdoing something similar. In my test program the no of samples read from each input is different for the last iteration of frame grabber. Could please suggest. Reference: https://www.ffmpeg.org/doxygen/trunk/ffmpeg_8c-source.html

01538                 for (j = 0; j < fg->nb_outputs; j++) {
01539                     OutputStream *ost = fg->outputs[j]->ost;
01540                     if (ost->enc->type == AVMEDIA_TYPE_AUDIO &&
01541                         !(ost->enc->capabilities & CODEC_CAP_VARIABLE_FRAME_SIZE))
01542                         av_buffersink_set_frame_size(ost->filter->filter,
01543                                                      ost->st->codec->frame_size);
01544                 } 

Also from the API documentation of av_buffersink_set_frame_size() method. Reference: https://www.ffmpeg.org/doxygen/trunk/group__lavfi__buffersink.html#ga359d7d1e42c27ca14c07559d4e9adba7

void av_buffersink_set_frame_size(AVFilterContext * ctx, unsigned frame_size)       
Set the frame size for an audio buffer sink.
All calls to av_buffersink_get_buffer_ref will return a buffer with exactly the specified number of samples, or AVERROR(EAGAIN) if there is not enough. The last buffer at EOF will be padded with 0.
saudet commented 5 years ago

We need to set a fixed frame size for the output when the encoder doesn't support variable frame size. That's something we should add I suppose, but it's not something anyone has had any problems with yet, and it's not related to the issue you're having.