About resample 44100 to 22050

gabemagee / gunshot_detection

Building a model that can detect gunshots from audio and that can also be scalably deployed to a Raspberry Pi cluster.

42 stars 11 forks source link

About resample 44100 to 22050 #9

Closed jayer95 closed 3 years ago

jayer95 commented 3 years ago

@gabemagee

Hi, I would like to ask you a question.

You set the microphone's sample rate to 44100.

https://github.com/gabemagee/gunshot_detection/blob/master/raspberry_pi/gunshot_detection.py#L38 AUDIO_RATE = 44100

When the model inference, you use librosa to change the sample rate to 22050.

https://github.com/gabemagee/gunshot_detection/blob/master/raspberry_pi/gunshot_detection.py#L436 modified_microphone_data = librosa.resample(y = microphone_data, orig_sr = AUDIO_RATE, target_sr = 22050)

Will this (resample) make the .wav sound file distorted when model inference?

amorehead commented 3 years ago

@jayer95,

This is a great question. Resampling is a common procedure in signal processing, and in this case, it should not affect the quality of the data on which the model is training or doing inference by any considerable margin. If you would like, I would recommend looking through some of the Librosa library's resources and discussions on the pros and cons of resampling between 44,100 Hz and 22,050 Hz (i.e. two very common audio standards). One resource, in particular, is https://librosa.org/blog/2019/07/17/resample-on-load/. Let me know if this helps clarify your questions.

jayer95 commented 3 years ago

@jayer95

Thank you very much for your professional reply. The reference link you gave is very helpful to me. Regarding audio-related background knowledge such as Audio Sampling Rate, Mel Spectrogram, MFCC, in addition to studying your paper, I also read other papers to improve my professional knowledge of audio recognition. I also refer to many of DCASE2017 and 2019 related works and practices.

Regarding capturing audio every two seconds for inference, is there a reason why you used two seconds? Why not one second or three seconds?

I'm planning to use two threads to process the work that overlaps for another two seconds between the first two seconds and the next two seconds. Do you have any suggestions before I start?

amorehead commented 3 years ago

In regards to capturing audio every two seconds for inference, we surveyed the literature and subsequently decided that two seconds should, at least in our context, be enough to capture the sound of most gunshots. We found with capturing audio every one second that oftentimes our model would be fed poorly-timed slices of the middle of a gunshot (from the perspective of a spectrogram).

For integrating the parallel threads, @laurenogd actually gave something like this a shot, and I do not remember exactly where we ended up in this endeavor.