Closed rbracco closed 3 years ago
The type error mentions that "input_signal_length" has no corresponding input types - and the two valid input names are input_signal and length.
So if you replace the call function argument of empty_model.preprocessor() from input_signal_length to just length, it should work
Thank you, I was able to get this working thanks to your help and will share code in case anyone else needs to replicate. That being said I'm still hoping there is a better way to do this. I also had to change normalization in config from per_feature
to all_features
to avoid a NaN issue.
Any insight into why there are two channels returned by the preprocessor and one of them is empty? Is it because I pass the length as a 2D tensor (shape: [1, num_samples]) instead of a 1D tensor? Thank you.
from IPython.display import display
import matplotlib.pyplot as plt
import librosa.display
plt.rcParams["figure.figsize"]=(12,9)
def display_specs(model, audio_file):
display(Audio(audio_file))
model.to('cpu')
y0,sr = torchaudio.load(audio_file)
y0r = torchaudio.transforms.Resample(sr,16000)(y0)
y0_len = torch.tensor(y0r.shape)
fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True)
spec_result = model.preprocessor(input_signal=y0r, length=torch.tensor(y0_len))
ax[0].set(title="First channel of spec")
librosa.display.specshow(spec_result[0][0].numpy(), ax=ax[0])
ax[1].set(title="Second channel of spec")
librosa.display.specshow(spec_result[0][1].numpy(), ax=ax[1])
aug_result = model.spec_augmentation(input_spec=spec_result[0][1].unsqueeze(0))
ax[2].set(title="Augmented second channel of spec")
librosa.display.specshow(aug_result[0].numpy(), ax=ax[2])
Use by calling
display_specs(<your_model_instance>, <path_to_audiofile>)
I also had to change normalization in config from per_feature to all_features to avoid a NaN issue.
This should not be needed
Any insight into why there are two channels returned by the preprocessor and one of them is empty? Is it because I pass the length as a 2D tensor (shape: [1, num_samples]) instead of a 1D tensor?
Length is supposed to be a 1D tensor with each element representing the duration per sample. So len(length) == batch size; length[0] = len(sample_0).
You can take a look at FilterbankFeatures
to make out whats happening inside the preprocessor module.
Describe your question
I would like to be able to display the spectrograms that are being generated by Quartznet and SpecAugment but I'm having trouble being able to do so.
Things I tried:
EncDecCTCModel
and paste in code but I was hoping there was a better way to be able to view spectrograms as they are very important.Environment overview (please complete the following information)