MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.81k stars 525 forks source link

Operate on PCM data #775

Open 009deep opened 6 years ago

009deep commented 6 years ago

Is there any way to perform following with essentia library?

  1. Extract PCM data from wav file. (There are multiple libraries available such as libsndfile which can do this but wanted to see if I can avoid it while using essentia, in most cases for wav file, it is 44 bytes + PCM data).
  2. Extract features from PCM input as signal vector. I see monoloader's output data isn't PCM and if I give PCM data as input to same algorithms, output is different as compared to output of monoloader as input to same algorithms.
  3. If not PCM, what is output of monoloader?

Thanks, Navdeep

dbogdanov commented 6 years ago

AudioLoader and MonoLoader internally use sample format conversion to AV_SAMPLE_FMT_FLT with libavresample library. The output is a vector of float numbers within the interval -1 to 1, not the raw PCM values, and this is the format that all Essentia analysis algorithms expect to receive.

009deep commented 6 years ago

Thanks Dmitry. Any guide/ sample code on how to convert raw PCM values to desired format for essentia algorithm?

Also regarding your following comment on google group

For your application, you need to convert the PCM data buffer to the std::vector values yourself as a previous step to feeding this data to Essentia algorithms.

Just putting short PCM data into float value won't make them in range 1 to -1. So feature derived that way aren't correct features.

009deep commented 6 years ago

@dbogdanov Is there any sample code reference to convert PCM data to useful format for essentia?

dbogdanov commented 6 years ago

There's no examples, as we only do that using libavresample internally inside AudioLoader. Are you interested in C++ or python too? There's some related discussion here.

You should scale the PCM data, depending on the bitdepth, you have various min and max integer value that you want to scale down to -1 and 1. As you mentioned, libsndfile does that, so may be you can search there for examples.

009deep commented 6 years ago

Thanks I finally figured out. I am using combination of librosa and essentia now. I went through librosa code and learnt how they do conversion (it's similar to what is discussed in above link you shared), tried same with essentia and it works. Thanks for the help.

My final requirement is in c++ but I can use python for part of it.

dbogdanov commented 6 years ago

Can you share the link to the conversion in librosa, so that we have it registered here?

009deep commented 6 years ago

There is no link of discussion on librosa, I manually went through librosa code step by step and saw what they are doing and it's exactly what is mentioned on above link you shared.

But, buf_to_float in librosa code can help understand how to manipulate raw PCM data.

jcurtis-cc commented 3 years ago

I found this issue looking for a method to implement the monoloader for byte object instead of a file reference. The comments here were useful, I tried librosa, but I found pydub to work better. In my case I'm working with mp3s and wanted to keep everything in memory and not on disk

from pydub import AudioSegment

def monoloaderfromblob(f): 
    # f = byte object (mp3 or wav audio as blob etc)
    f_as = AudioSegment.from_file(io.BytesIO(f), format="mp3")
    f_asfr = f_as.set_frame_rate(16000)
    ch_snd = f_asfr.split_to_mono()
    samples = [s.get_array_of_samples() for s in ch_snd]
    #downmix to mono /2 then sum
    fp_arr = np.array(samples).astype(np.float32)
    fp_arr *= 0.5
    summed = np.add(fp_arr[0],fp_arr[1])
    summed /= np.iinfo(samples[0].typecode).max
    return summed