Closed OndrejGl closed 4 years ago
Hi,
Not sure about your problem, are you sure that you have any trainable parameters? Something like this has worked fine for me so far. Here I freeze only part of the encoder graph but should be the same approach. Also don't load decoder at all because I use different vocabulary.
encoder = nemo_asr.JasperEncoder(
feat_in=quartz_params["AudioToMelSpectrogramPreprocessor"]["features"], **quartz_params["JasperEncoder"],
)
encoder.restore_from("quartznet15x5/JasperEncoder-STEP-247400.pt")
not_freeze_list = ["encoder.17.*"]
freeze_reg = '(?:% s)' % '|'.join(not_freeze_list)
variables = [name for name, param in encoder.named_parameters() if not re.match(freeze_reg, name)]
print(f"Will freeze:\n {variables}")
encoder.freeze(variables)
print(f"Trainable encoder params:\n {[name for name, param in encoder.named_parameters() if param.requires_grad]}")
decoder = nemo_asr.JasperDecoderForCTC(
feat_in=quartz_params["JasperEncoder"]["jasper"][-1]["filters"], num_classes=len(vocab),
)
Also I have never used torch.distributed.launch
maybe try to run things first without it and see if you still have similar problem
This should be fixed in the latest master. However, I strongly recommend you do not freeze encoder weights while fine-tuning (we are getting pretty good results when fine-tuning without freezing)
Hi, I would like to re-use a trained Quartznet encoder, and train the decoder on new data. After the encoder is defined, I call:
encoder.freeze()
However, when I run the training via torch.distributed.launch, I get:AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
Here is the full traceback:
What I am doing wrong? Thanks