100% WER when finetuning from custom checkpoint

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

Apache License 2.0

11.84k stars 2.46k forks source link

100% WER when finetuning from custom checkpoint #849

Closed jonaskratochvil closed 3 years ago

jonaskratochvil commented 4 years ago

Hello,

I have used the QuartzNet pretrained checkpoint to fine-tune the ASR model on my custom data. This fine-tuning works fine but when I use the newly obtained checkpoint to fine-tune the mode on yet another dataset I get 100% WER and validation loss equal to nan from the first evaluation onward throughout the whole training. Is there any specific reason why this should be the case? I have left the training script untouched between the two fine-tuning runs. When I use the original QuartzNet checkpoint and fine-tune directly on my second dataset both the WER and loss are decreasing as expected. Any help would be appreciated.

Jonas

Jovianan commented 4 years ago

I have exactly the same issue. Have you managed to resolve it?

vsl9 commented 4 years ago

What about training loss? Is it decreasing? Can you please double-check that NeMo restores only encoder and decoder checkpoints and not TRAINER (optimizer's state)? For example, if you are using NeMo v0.10 with quartznet.py script then please make sure that load_dir doesn't have TRAINER*.pt files.

Jovianan commented 4 years ago

I've used both v0.10(jasper.py) and v0.11(speech2text.py ) (through latest NGC container) and it behaves the same. In both cases I've started training from latest v2 pre-trained multidataset QuartzNet. Training and validation loss both goes down (lr=1.5e-4) up to a certain point, then validation WER shoots up to 100% but training loss continues to go down normally, predictions of training samples continue to be good also. Using saved checkpoints with speech2text_infer.py returns empty string for every dev_data sample.

Fine-tunning with lower lr (1e-5) seems to fix the problem.

raziel130889 commented 3 years ago

@Jovianan What code you using to re-train (fine-tunning)? I have problem with Hydra. raise ValueError(f'Invalid Datatype for loaders: {type(self.loaders).name}') ValueError: Invalid Datatype for loaders: NoneType