MaigoAkisame / cmu-thesis

Code for Yun Wang's PhD Thesis: Polyphonic Sound Event Detection with Weak Labeling
MIT License
165 stars 46 forks source link

Data processing #4

Closed findmenowhere closed 3 years ago

findmenowhere commented 3 years ago

Hi Maigo,

How do you get filterbank features from audios? I didn't find any code related to data processing and the data downloaded from the bash file is already prepared.

MaigoAkisame commented 3 years ago

Here's my code to extract features from a single recording:

import librosa

def extract(wav):
    # Takes a waveform (length 160,000, sampling rate 16,000) and extracts filterbank features (size 400 * 64)
    spec = librosa.core.stft(wav, n_fft = 4096,
                             hop_length = 400, win_length = 1024,
                             window = 'hann', center = True, pad_mode = 'constant')
    mel = librosa.feature.melspectrogram(S = numpy.abs(spec), sr = 16000, n_mels = 64, fmax = 8000)
    logmel = librosa.core.power_to_db(mel[:, :400])
    return logmel.T.astype('float32')

After feature extraction, I normalized each dimension to have zero mean and unit variance globally (i.e. across all training recordings).

If you downloaded the data from my GitHub repo, you should find "normalizer.pkl" files that contain the "mu" and "sigma" for normalization.

findmenowhere commented 3 years ago

Thanks! That's what I need.