How to view raw and augmented spectrograms?[Question]

rbracco commented 3 years ago

Describe your question

I would like to be able to display the spectrograms that are being generated by Quartznet and SpecAugment but I'm having trouble being able to do so.

Things I tried:

If I apply the preprocessor directly to a tensor (loaded with torchaudio) I get a TypeError because the input type doesn't match (included below).
There's no way to hook in and log the spectrogram (that I'm aware of) because it is only available from the middle of the forward method.
I could copy the entire EncDecCTCModel and paste in code but I was hoping there was a better way to be able to view spectrograms as they are very important.
I checked the ASR_with_nemo tutorial but the only spectrograms displayed are manually generated using librosa.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-198-408d2f6bb357> in <module>()
----> 1 empty_model.preprocessor(input_signal=y_resampled, input_signal_length=y_resampled.size(-1))

2 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in __call__(self, wrapped, instance, args, kwargs)
    506 
    507         # Perform rudimentary input checks here
--> 508         instance._validate_input_types(input_types=input_types, **kwargs)
    509 
    510         # Call the method - this can be forward, or any other callable method

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in _validate_input_types(self, input_types, **kwargs)
     96                 if key not in input_types:
     97                     raise TypeError(
---> 98                         f"Input argument {key} has no corresponding input_type match. "
     99                         f"Existing input_types = {input_types.keys()}"
    100                     )

TypeError: Input argument input_signal_length has no corresponding input_type match. Existing input_types = dict_keys(['input_signal', 'length'])

Environment overview (please complete the following information)

Colab w Nemo 1.0.0b2

titu1994 commented 3 years ago

The type error mentions that "input_signal_length" has no corresponding input types - and the two valid input names are input_signal and length.

So if you replace the call function argument of empty_model.preprocessor() from input_signal_length to just length, it should work

rbracco commented 3 years ago

Thank you, I was able to get this working thanks to your help and will share code in case anyone else needs to replicate. That being said I'm still hoping there is a better way to do this. I also had to change normalization in config from per_feature to all_features to avoid a NaN issue.

Any insight into why there are two channels returned by the preprocessor and one of them is empty? Is it because I pass the length as a 2D tensor (shape: [1, num_samples]) instead of a 1D tensor? Thank you.

from IPython.display import display
import matplotlib.pyplot as plt
import librosa.display
plt.rcParams["figure.figsize"]=(12,9)
def display_specs(model, audio_file):
    display(Audio(audio_file))
    model.to('cpu')
    y0,sr = torchaudio.load(audio_file)
    y0r = torchaudio.transforms.Resample(sr,16000)(y0)
    y0_len = torch.tensor(y0r.shape)
    fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True)
    spec_result = model.preprocessor(input_signal=y0r, length=torch.tensor(y0_len))
    ax[0].set(title="First channel of spec")
    librosa.display.specshow(spec_result[0][0].numpy(), ax=ax[0])
    ax[1].set(title="Second channel of spec")
    librosa.display.specshow(spec_result[0][1].numpy(), ax=ax[1])
    aug_result = model.spec_augmentation(input_spec=spec_result[0][1].unsqueeze(0))
    ax[2].set(title="Augmented second channel of spec")
    librosa.display.specshow(aug_result[0].numpy(), ax=ax[2])

Use by calling display_specs(<your_model_instance>, <path_to_audiofile>)

titu1994 commented 3 years ago

I also had to change normalization in config from per_feature to all_features to avoid a NaN issue.

This should not be needed

Any insight into why there are two channels returned by the preprocessor and one of them is empty? Is it because I pass the length as a 2D tensor (shape: [1, num_samples]) instead of a 1D tensor?

Length is supposed to be a 1D tensor with each element representing the duration per sample. So len(length) == batch size; length[0] = len(sample_0).

You can take a look at FilterbankFeatures to make out whats happening inside the preprocessor module.

NVIDIA / NeMo

How to view raw and augmented spectrograms?[Question] #1645