Data processing - Githubissues

findmenowhere commented 3 years ago

Hi Maigo,

How do you get filterbank features from audios? I didn't find any code related to data processing and the data downloaded from the bash file is already prepared.

MaigoAkisame commented 3 years ago

Here's my code to extract features from a single recording:

import librosa

def extract(wav):
    # Takes a waveform (length 160,000, sampling rate 16,000) and extracts filterbank features (size 400 * 64)
    spec = librosa.core.stft(wav, n_fft = 4096,
                             hop_length = 400, win_length = 1024,
                             window = 'hann', center = True, pad_mode = 'constant')
    mel = librosa.feature.melspectrogram(S = numpy.abs(spec), sr = 16000, n_mels = 64, fmax = 8000)
    logmel = librosa.core.power_to_db(mel[:, :400])
    return logmel.T.astype('float32')

After feature extraction, I normalized each dimension to have zero mean and unit variance globally (i.e. across all training recordings).

If you downloaded the data from my GitHub repo, you should find "normalizer.pkl" files that contain the "mu" and "sigma" for normalization.

findmenowhere commented 3 years ago

Thanks! That's what I need.

MaigoAkisame / cmu-thesis

Data processing #4