Open 009deep opened 6 years ago
AudioLoader and MonoLoader internally use sample format conversion to AV_SAMPLE_FMT_FLT with libavresample library. The output is a vector of float numbers within the interval -1 to 1, not the raw PCM values, and this is the format that all Essentia analysis algorithms expect to receive.
Thanks Dmitry. Any guide/ sample code on how to convert raw PCM values to desired format for essentia algorithm?
Also regarding your following comment on google group
For your application, you need to convert the PCM data buffer to the std::vector
values yourself as a previous step to feeding this data to Essentia algorithms.
Just putting short PCM data into float value won't make them in range 1 to -1. So feature derived that way aren't correct features.
@dbogdanov Is there any sample code reference to convert PCM data to useful format for essentia?
There's no examples, as we only do that using libavresample internally inside AudioLoader. Are you interested in C++ or python too? There's some related discussion here.
You should scale the PCM data, depending on the bitdepth, you have various min and max integer value that you want to scale down to -1 and 1. As you mentioned, libsndfile does that, so may be you can search there for examples.
Thanks I finally figured out. I am using combination of librosa and essentia now. I went through librosa code and learnt how they do conversion (it's similar to what is discussed in above link you shared), tried same with essentia and it works. Thanks for the help.
My final requirement is in c++ but I can use python for part of it.
Can you share the link to the conversion in librosa, so that we have it registered here?
There is no link of discussion on librosa, I manually went through librosa code step by step and saw what they are doing and it's exactly what is mentioned on above link you shared.
But, buf_to_float in librosa code can help understand how to manipulate raw PCM data.
I found this issue looking for a method to implement the monoloader for byte
object instead of a file reference. The comments here were useful, I tried librosa, but I found pydub to work better. In my case I'm working with mp3s and wanted to keep everything in memory and not on disk
from pydub import AudioSegment
def monoloaderfromblob(f):
# f = byte object (mp3 or wav audio as blob etc)
f_as = AudioSegment.from_file(io.BytesIO(f), format="mp3")
f_asfr = f_as.set_frame_rate(16000)
ch_snd = f_asfr.split_to_mono()
samples = [s.get_array_of_samples() for s in ch_snd]
#downmix to mono /2 then sum
fp_arr = np.array(samples).astype(np.float32)
fp_arr *= 0.5
summed = np.add(fp_arr[0],fp_arr[1])
summed /= np.iinfo(samples[0].typecode).max
return summed
Is there any way to perform following with essentia library?
wav
file. (There are multiple libraries available such as libsndfile which can do this but wanted to see if I can avoid it while using essentia, in most cases for wav file, it is 44 bytes + PCM data).Thanks, Navdeep