Closed rbracco closed 3 years ago
You should not be overriding your decoder as such. You should use a custom neural module.
1) Make your custom decoder extend neural module class 2) Override the nemo.collections.asr.modules.ConvASRDecoder classpath with the classpath of your new decoder. 3) Create the model with this config and train.
During evaluation, make sure to import the custom decoder before you use restore from.
Thank you, I will try this and follow up if I have any questions. If I get it working I will leave an example for future people with the same issue.
Okay so I worked on this for a while and then got stuck. I wrote a TwoLayerDecoder class that extends NeuralModule but I'm not sure how to use it with a pretrained model. I have two questions.
nemo.collections.asr.modules.ConvASRDecoder
and then change what I need? Or is it okay to inherit from nemo.collections.asr.modules.ConvASRDecoder
and just write a new __init__
and forward
? Thank you!For changing the decoder without changing vocabulary size (keep 28 char vocab size of QN) it's done as follows
For 1) you should be able to do
quartznet.decoder = MyNewTwoLayerDecoder()
.
Don't forget to update quartzNet.cfg.decoder._target_
with the new class path to your file, otherwise restore from won't work.
That's about it.
For a decoder that changes the vocabulary size, first use the quartznet.change_vocabulary() method then do the above steps.
For 2) inheriting is fine, only if you override the decoder layer correctly so as to remove the previous created weights with your own. For clean approach, I'd say copy paste the code and edit the portions you wish to change.
Thank you, so I actually tried 3 different ways...
quartznet.decoder = MyNewTwoLayerDecoder(*args)
quartznet.decoder=new_model.decoder
(this one mainly to make sure the config file would work)All 3 methods successfully combine the encoder pretrained weights and the new decoder. I then freeze the encoder layers and fit, but in each case I get KeyError: 30
for the line reference = ''.join([self.labels_map[c] for c in target])
in wer.py
. I have 41 classes (including blank label) and the correct vocab and # of classes appears when I run quartznet.decoder.vocabulary
but when I enter the debugger the labels_map
attribute of WER (a dict mapping int values to labels) is still using the original English alphabet. I did make sure to call quartznet.change_vocabulary(new_vocabulary=my_new_vocab)
but somehow it doesn't fix it.
Any ideas? Please let me know if this warrants a new issue and in the meantime I'll keep digging. Thanks so much for your time and help.
Oh wow, so 2 seconds after posting this, I reran the code but swapping the order of the execution to be
quartznet.change_vocabulary(new_vocabulary=my_new_vocab)
quartznet.decoder = ConvASRDecoderTwo(1024, 40, vocabulary=my_new_vocab)
instead of
quartznet.decoder = ConvASRDecoderTwo(1024, 40, vocabulary=my_new_vocab)
quartznet.change_vocabulary(new_vocabulary=my_new_vocab)
and the KeyError disappeared. It appears that you need to change the vocabulary on the pretrained model prior to instantiating the new decoder.
I'm still not sure everything is worked out because the loss is coming down much more slowly than when I overwrote the decoder manually with quartznet.decoder.decoder_layers = nn.Sequential(nn.Conv1d(1024, 256, kernel_size=1, stride=1), nn.ReLU(), nn.Conv1d(256, 41, kernel_size=1, stride=1))
. I'll keep digging and report back.
The error dissapears but actually now wer will be incorrect - it will assume the CTC blank I'd is 29, but you have 41 labels.
I would rather suggest this (apologies for the roundabout way above) -
1) create neural module
2) change QuartzNet.cfg.decoder._target_
to classpath of new decoder
3) simply call change vocabulary.
That should be all that's actually needed. Please let me know If this works
Thank you for this. I'm trying it now but am running into some issues setting the config. I started with code from the ASR with NeMo tutorial and they appear to use Hydra 1.0. I see that in Hydra 1.1 cls
is replaced with _target_
in the config files, so I followed your instructions but for step 2 I changed quartznet.cfg.decoder.cls
to the classpath of my new decoder. When I call change_vocabulary
however, my decoder doesn't change.
Exactly what I did is below, any ideas?
All I changed here is A. Changed the decoder layers from a single 1d conv to 1dconv->relu->1dconv. B. Commented out the line that does weight initialization for Nemo.
class ConvASRDecoderTwo(NeuralModule, Exportable):
"""Simple ASR Decoder for use with CTC-based models such as JasperNet and QuartzNet
Based on these papers:
https://arxiv.org/pdf/1904.03288.pdf
https://arxiv.org/pdf/1910.10261.pdf
https://arxiv.org/pdf/2005.04290.pdf
"""
def save_to(self, save_path: str):
pass
@classmethod
def restore_from(cls, restore_path: str):
pass
@property
def input_types(self):
return OrderedDict({"encoder_output": NeuralType(('B', 'D', 'T'), AcousticEncodedRepresentation())})
@property
def output_types(self):
return OrderedDict({"logprobs": NeuralType(('B', 'T', 'D'), LogprobsType())})
def __init__(self, feat_in, num_classes, init_mode="xavier_uniform", vocabulary=None):
super().__init__()
if vocabulary is not None:
if num_classes != len(vocabulary):
raise ValueError(
f"If vocabulary is specified, it's length should be equal to the num_classes. Instead got: num_classes={num_classes} and len(vocabulary)={len(vocabulary)}"
)
self.__vocabulary = vocabulary
self._feat_in = feat_in
# Add 1 for blank char
self._num_classes = num_classes + 1
self.decoder_layers = torch.nn.Sequential(
torch.nn.Conv1d(self._feat_in, 256, kernel_size=1, bias=True),
torch.nn.ReLU(),
torch.nn.Conv1d(256, self._num_classes, kernel_size=1, bias=True),
)
#self.apply(lambda x: init_weights(x, mode=init_mode))
@typecheck()
def forward(self, encoder_output):
return torch.nn.functional.log_softmax(self.decoder_layers(encoder_output).transpose(1, 2), dim=-1)
def input_example(self):
"""
Generates input examples for tracing etc.
Returns:
A tuple of input examples.
"""
bs = 8
seq = 64
input_example = torch.randn(bs, self._feat_in, seq).to(next(self.parameters()).device)
return tuple([input_example])
def _prepare_for_export(self):
m_count = 0
for m in self.modules():
if type(m).__name__ == "MaskedConv1d":
m.use_mask = False
m_count += 1
if m_count > 0:
logging.warning(f"Turned off {m_count} masked convolutions")
Exportable._prepare_for_export(self)
@property
def vocabulary(self):
return self.__vocabulary
@property
def num_classes_with_blank(self):
return self._num_classes
QuartzNet.cfg.decoder.cls
to classpath of new decoder, call change_vocabulary
Describe your question
I would like to implement my own decoder for transfer learning to experiment with multiple linear layers. I did so by overriding quartznet.decoder.decoder_layers with my own
nn.Sequential
using the code belowThis worked and it trains well, but when I save the model and try to load it, I get the following error:
This is because my underlying config file still specifies the decoder as being from the class
nemo.collections.asr.modules.ConvASRDecoder
. I have no idea how to update the config file to use my new decoder, or how to bypass the config file altogether, and couldn't find a way to do so in the docs. Even if I load up quartznet and manually overwrite the decoder, then try to load the saved checkpoint, it fails because it seems to be using the config file behind the scenes.Environment overview (please complete the following information)
Colab using nemo-toolkit[all]==1.0.0b1 and config from https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/asr/conf/config.yaml
Environment details
If NVIDIA docker image is used you don't need to specify these. Python 3.6.9 Pytorch 1.7 OS: Ubuntu 18.04.5 LTS
Additional context