Question regarding feature extraction

SarthakYadav commented 6 years ago

Hello everyone, This is not an issue but rather a question. As seen in the implementation, the feature extraction is done as follows:

n_fft = int(self.sample_rate * self.window_size) win_length = n_fft hop_length = int(self.sample_rate * self.window_stride) D = librosa.stft(audio, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=self.window) spect, phase = librosa.magphase(D)

I have two questions:

Are these features log powered melspectrograms? I know what the code is doing, but I wanted to know the exact terminology of the features extracted.
librosa has a "melspectrogram" extraction function. What is the difference between the features extracted using that function and features extracted here.

Any resource to read on these feature extraction methods would be appreciated as well. Thanks!!!

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

SeanNaren / deepspeech.pytorch

Question regarding feature extraction #245