YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.13k stars 212 forks source link

Real-time microphone testing #23

Closed ridasaleem0 closed 3 years ago

ridasaleem0 commented 3 years ago

Hi, i've been using your model for classification and audio analysis and it works great. I have trained my own model and was wondering if there's a way to test it in real-time with microphone rather than audio file, if you could provide a way forward it would be greatt.

YuanGongND commented 3 years ago

Hi there,

Thanks for using our model.

Regarding real-time applications. It is totally possible - First, you need a single audio file inference script. @JeffC0628 contributed one here. Note it is designed for our pretrained model, you might need to adjust the normalization stats at line 40, and model loading at line 83-87. Second, you need to build a recording script, you can use something like PyAudio, say your model also takes 10 second audio as input, you can input the 10-second audio every 10 seconds or your desired interval.

-Yuan

ridasaleem0 commented 3 years ago

Thank you for such a quick response. I was wondering do we just need to call inferency.py script inside our Recording script? If yes how exactly can we use the --audio_file parameter to go with live inference?

YuanGongND commented 3 years ago

Hi,

You need to modify inference.py to a function that takes audio_path as input and outputs the prediction. That should be straightforward, you mainly need to modify the main function. Note that you should avoid loading the model for every audio sample.

If you use our AudioSet pretrained model, that is all you need, if your task is different, you need to modify the make_features to adjust the target_length and normalization stats, they are hard-coded in inference.py.

-Yuan

ridasaleem0 commented 3 years ago

Okay got it, thank you.

Also is there a way to send continuous audio for prediction (for every 10 sec) rather than audio_path? probably waveforms?

YuanGongND commented 3 years ago

Yes, but you need to modify inference.py feature function more by yourself. AST in fact takes waveform and converts it to spectrogram so it is totally possible. I would encourage you to check inference.py in detail and understand how it works, it is not complex.

-Yuan

ridasaleem0 commented 3 years ago

Basically my scenario is to classify the continuous audio stream (every 5 secs) from microphone without saving the audio file, I've explored inference.py and and the feature function and it takes an audio_path as input and use the waveforms from it. I was wondering if there could be possibility to use direct microphone audio stream to run my model for every 5 sec and generate results periodically.

YuanGongND commented 3 years ago

You should not expect to use inference.py without modification. In line 26 of inference.py, audio waveform is converted to spectrogram. You can modify the make_features function to make it take waveform rather than the path as input.

ridasaleem0 commented 3 years ago

Okay thank you so much for clarification.