SeanNaren / deepspeech.pytorch

Speech Recognition using DeepSpeech2.
MIT License
2.1k stars 619 forks source link

Question regarding feature extraction #245

Closed SarthakYadav closed 4 years ago

SarthakYadav commented 6 years ago

Hello everyone, This is not an issue but rather a question. As seen in the implementation, the feature extraction is done as follows:

n_fft = int(self.sample_rate * self.window_size) win_length = n_fft hop_length = int(self.sample_rate * self.window_stride) D = librosa.stft(audio, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=self.window) spect, phase = librosa.magphase(D)

I have two questions:

  1. Are these features log powered melspectrograms? I know what the code is doing, but I wanted to know the exact terminology of the features extracted.
  2. librosa has a "melspectrogram" extraction function. What is the difference between the features extracted using that function and features extracted here.

Any resource to read on these feature extraction methods would be appreciated as well. Thanks!!!

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.