Use ffmpeg to send input to opensmile to get features?

audeering / opensmile

The Munich Open-Source Large-Scale Multimedia Feature Extractor

https://audeering.github.io/opensmile/

Other

553 stars 74 forks source link

Use ffmpeg to send input to opensmile to get features? #35

Open aniketzz opened 2 years ago

aniketzz commented 2 years ago

I want to use FFMEPG to send input to the opensmile and generate the features from egemaps, prosody or mfcc. I am able to modify the config files to get the live input but now I want to take the input from a video source and extract audio via ffmpeg and send it to opensmile.

chausner-audeering commented 2 years ago

There is a cFFmpegSource component but it only supports input from a file. If you want to use FFmpeg for live audio recording, you will need to do the recording outside of openSMILE and pass the data via SMILEapi and cExternalAudioSource to openSMILE. For more information, see https://audeering.github.io/opensmile/reference.html#smileapi-c-api-and-wrappers.

aniketzz commented 2 years ago

Can you please elaborate? I am getting some trouble understanding where and what to change. For example: when I looked at SMILEapi, I did not understand where the input was coming from. How do I call cExternalAudioSource? For using local device microphone I am using the below code in config:

[waveIn:cPortaudioSource]
writer.dmLevel=wave
monoMixdown = 0
 ; -1 is the default device, set listDevices=1 to see a device list
device = -1
listDevices = 0
sampleRate = 16000
 ; if your soundcard only supports stereo (2-channel) recording, 
 ; use channels=2 and set monoMixdown=1
channels = 1
nBits = 16
audioBuffersize_sec = 0.050000
buffersize_sec=2.0

chausner-audeering commented 2 years ago

Documentation on SMILEapi is unfortunately rather sparse. Basically, it boils down to:

Replacing cPortaudioSource in the config with cExternalAudioInput
Using SMILEapi to load and run the config file
Passing audio data via SMILEapi to the cExternalAudioInput component

SMILEapi is a C API for maximum compatibility with other languages. openSMILE includes a Python wrapper which is recommended if you are working in Python.

You might also want to take a look at the implementation of https://github.com/audeering/opensmile-python which under the hood uses SMILEapi via the Python wrapper.

aniketzz commented 2 years ago

Is there any way to get the data per frameTime in realtime for prosody, mfcc and egemaps in opensmile? I am able to configure the API to generate the features for prosody, mfcc and egemaps. The current input is a file. How do I get the features in realtime using the API? currently, it generated the data as a series in one go.

Also, What will be the way to use ffmpeg with the api? I see that I have to pass the data(audio file) generated by ffmpeg or can I stream data via ffmpeg and pass it.

chausner-audeering commented 2 years ago

When using SMILEapi in combination with eExternalSink, you will get the features in real-time as soon as they are generated.

Also, What will be the way to use ffmpeg with the api?

You can stream audio in real-time from FFmpeg to openSMILE. You'll need to set up the audio recording with FFmpeg, and then pass each individual buffer of audio received from FFmpeg to openSMILE via the SMILEapi function smile_extaudiosource_write_data.

aniketzz commented 2 years ago

What will be the way to use FFmpeg with the python API? How do I get the features in real-time using the python API? I have changed the config to:

[waveIn:cFFmpegSource] writer.dmLevel = wave blocksize_sec = 1.0 filename = \cm[inputfile(I){test.wav}:name of input file] monoMixdown = 1.0 outFieldName = pcm However, it takes input from a file but I want to take input from a port. For example, I'll be sending an audio file through 8000 port and I want to pass this input to the open smile python API

chausner-audeering commented 2 years ago

cFFmpegSource only supports input from files. If you need to receive an audio stream via the network and you want to decode it using FFmpeg, I suggest to ask in the FFmpeg forums or maybe StackOverflow for help. I can help you with passing the audio via the SMILEapi interface to openSMILE.

To get started with SMILEapi, see the API definition and comments in https://github.com/audeering/opensmile/blob/master/progsrc/smileapi/python/opensmile/SMILEapi.py. See also the help in the openSMILE documentation on components cExternalAudioSource and cExternalSink.

aniketzz commented 2 years ago

We have ffmpeg command ready to decode the audio which is coming from the UDP port, but How do we integrate the command into the opensmile python API?

aniketzz commented 2 years ago

We have ffmpeg command ready to decode the audio which is coming from the UDP port, but How do we integrate the command into the opensmile python API?

can anyone help me with the above query?