A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
I want re-procedure result like when i use compute log-filterbank energy of lib: python_speech_feature by using torchaudio.
this is my code:
# load audio data by librosa
path_audio = "audio_a.wav"
y, sr = librosa.load(path_audio, sr=16000, offset=0.5, duration=0.4)
# load audio data by torch audio
audio_ft, sr = torchaudio.load(path_audio)
audio_ft = audio_ft.squeeze(0)
y_torch = audio_ft[int(0.5*16000):int(0.9*16000)]
# the result is the same then i compute log filterbank energy
# log filter bank energy compute by python_speech_feature lib
ft_f_bank = python_speech_features.logfbank(y, samplerate=16000, winlen=0.025, winstep=0.01, nfilt=64,nfft=512)
print(ft_f_bank.shape) # result: (39, 64)
# log filter bank energy compute by FilterbankFeatures module in audio_preprocessing (asr collection in nemo)
self.featurizer = FilterbankFeatures(sample_rate=16000, n_window_size=int(0.025*16000), n_window_stride=(0.01* 16000), n_fft=64, log=True)
ft_by_f_bank_nemo = self.fearturizer(y_torch) #result shape: (41, 64)
# log filter bank energy compute by torch audio kaldi compliance
ft_f_bank_by_torch = torchaudio.compliance.kaldi.fbank(y_torch, sample_frequency=16000.0, frame_length=25.0, frame_shift=10.0, use_log_fbank=True, use_energy=True, num_mel_bins=64)
print(ft_f_bank_by_torch.shape) # result: (38, 65)
How can i make result return by module filterbankfeature in nemo or torchaudio is the same with python speech feature. I'm not have deep understand more about speech feature, so question can so weird, sorry.
Thankyou
my torchaudio: 0.6.0,
pytorch:1.6.0
In FilterbankFeatures class we extend time length features to multiples of pad_to value here 16(default) for faster processing. Try changing that to -1 and try again.
I want re-procedure result like when i use compute log-filterbank energy of lib: python_speech_feature by using torchaudio. this is my code:
How can i make result return by module filterbankfeature in nemo or torchaudio is the same with python speech feature. I'm not have deep understand more about speech feature, so question can so weird, sorry. Thankyou my torchaudio: 0.6.0, pytorch:1.6.0