MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.83k stars 530 forks source link

Improve documentation for FrameGenerator #424

Closed great-thoughts closed 8 years ago

great-thoughts commented 8 years ago

I load a file to essentia. it has a sampling rate of 16Khz. after segmenting it into frames and playing the individual frames, the frames are slower. I suspect that the sampling rate of my signal is modified. How can I prevent framegenerator from changing the sampling rate of my frame?

loader = essentia.standard.AudioLoader(filename = source1)
y = loader()
audio = y[0].transpose()[0]
fs = y[1]
frames=np.array([frame for frame in FrameGenerator(audio,frameSize = 50*1024, hopSize = 512)])
Audio(data=frames[20],rate =fs)
dbogdanov commented 8 years ago

FrameGenerator solely cuts frames according to the specified number of samples, no modification to sample rate is done. Check if your Audio method works correctly.

great-thoughts commented 8 years ago

the audio command is working correctly. i imported it from from IPython.display import Audio

fs = 16000
loader2 = essentia.standard.MonoLoader(filename = source1,sampleRate=fs,downmix='left')
audio = loader2()
Audio(data=audio,rate = fs)

To debug I am trying the following:

frameSize=1024
hopSize = 1024
frames = FrameGenerator(audio, frameSize = frameSize , hopSize = hopSize)
new=[]
for frame in frames:
    a = 1.0*frame
    new.append(a)
new = np.array(new)

new[10*1024:11*1024]

array([[ 0.06491089,  0.09188843,  0.1036377 , ..., -0.01620483,
        -0.01132202, -0.02697754],
       [-0.02648926, -0.02746582, -0.04260254, ...,  0.21658325,
         0.18286133,  0.15548706],
       [ 0.10946655,  0.07275391,  0.06491089, ..., -0.0352478 ,
        -0.02496338, -0.02255249],
       ..., 
       [-0.18890381, -0.1796875 , -0.14453125, ...,  0.03564453,
         0.01751709,  0.05767822],
       [-0.00787354, -0.00299072,  0.04052734, ...,  0.01116943,
         0.02090454,  0.00088501],
       [-0.04510498, -0.04901123, -0.03433228, ...,  0.25271606,
         0.25912476,  0.25424194]], dtype=float32)

audio[10*1024:11*1024]

array([-0.10491943, -0.11447144, -0.1244812 , ..., -0.04129028,
       -0.01248169, -0.03842163], dtype=float32)
dbogdanov commented 8 years ago

There is a mismatch between new and audio because FrameGenerator is configured with startFromZero=False by default.

The documentation for FrameGenerator should be updated. Furthermore, it should be explained in more details in python tutorial (create a new subsection explaining FrameGenerator).

Documentation should explain how to compute time positions of each frame.

dbogdanov commented 8 years ago

In python, it might make sense to use startFromZero=True by default, but this might introduce backwards incompatibility with existing code using FrameGenerator.

dbogdanov commented 8 years ago

After reviewing the topic, keeping startFromZero=False seems to be best option.

dbogdanov commented 8 years ago

Pending TODOs:

dbogdanov commented 8 years ago

Related to #174