Closed ridasaleem0 closed 3 years ago
Hi there,
Thanks for using our model.
Regarding real-time applications. It is totally possible - First, you need a single audio file inference script. @JeffC0628 contributed one here. Note it is designed for our pretrained model, you might need to adjust the normalization stats at line 40, and model loading at line 83-87. Second, you need to build a recording script, you can use something like PyAudio, say your model also takes 10 second audio as input, you can input the 10-second audio every 10 seconds or your desired interval.
-Yuan
Thank you for such a quick response. I was wondering do we just need to call inferency.py script inside our Recording script? If yes how exactly can we use the --audio_file parameter to go with live inference?
Hi,
You need to modify inference.py to a function that takes audio_path as input and outputs the prediction. That should be straightforward, you mainly need to modify the main
function. Note that you should avoid loading the model for every audio sample.
If you use our AudioSet pretrained model, that is all you need, if your task is different, you need to modify the make_features
to adjust the target_length and normalization stats, they are hard-coded in inference.py.
-Yuan
Okay got it, thank you.
Also is there a way to send continuous audio for prediction (for every 10 sec) rather than audio_path? probably waveforms?
Yes, but you need to modify inference.py feature function more by yourself. AST in fact takes waveform and converts it to spectrogram so it is totally possible. I would encourage you to check inference.py in detail and understand how it works, it is not complex.
-Yuan
Basically my scenario is to classify the continuous audio stream (every 5 secs) from microphone without saving the audio file, I've explored inference.py and and the feature function and it takes an audio_path as input and use the waveforms from it. I was wondering if there could be possibility to use direct microphone audio stream to run my model for every 5 sec and generate results periodically.
You should not expect to use inference.py
without modification. In line 26 of inference.py
, audio waveform is converted to spectrogram. You can modify the make_features
function to make it take waveform rather than the path as input.
Okay thank you so much for clarification.
Hi, i've been using your model for classification and audio analysis and it works great. I have trained my own model and was wondering if there's a way to test it in real-time with microphone rather than audio file, if you could provide a way forward it would be greatt.