Fine-Tuning Fast Conformer CTC on Other Language

duckyngo commented 1 month ago

First of all, thank you for creating and maintaining this incredible framework. It has been a valuable tool for our work.

I am currently attempting to fine-tune the Fast Conformer CTC model on a Korean language dataset (1000 hours) using the pretrained English model as the starting point

Tokenizer: We are using a SentencePiece unigram tokenizer with 2700 tokens. (These tokens work well on Conformer CTC Large models.)
Batch Size: The batch size is set to 4, with accumulate_grad_batches set to 8.
SpecAugment Configuration: The spec_augment is configured with 2 frequency masks and 2 time masks.

Issue:

After 21 epochs of training, the validation WER remains at 1, and the learning rate does not seem to decrease.

I would greatly appreciate any guidance on what might be going wrong during the training process.

Additional Information: I attach a screenshot from W&B here for reference:

Environment details

OS version: Ubuntu 20.04
PyTorch version: 2.4.0
Nemo Version: 1.23.0

Additional context We have access to a 20,000-hour dataset, but since training on the full dataset would be very time-consuming, we decided to start with 1,000 hours to see if the model can converge before scaling up.

nithinraok commented 3 weeks ago

Could you also plot lr graph? Your validation loss kept on increasing. Could you start with 1024 tokens, howver you mentioned the same tokens work well for Conformer, does that mean you tried same set up with Conformer and it trained well and you are only seeing issues with FastConformer?

If possible share complete config.

duckyngo commented 3 weeks ago

Thank you for your support!

I managed to resolve the issue, and I wanted to share the solution in case others encounter a similar problem. The root cause was related to the batch size and learning rate. Since my batch size was relatively small, I found it necessary to reduce the learning rate accordingly. The default configuration’s learning rate parameters are optimized for a global batch size of 2K, so using a smaller batch size requires a lower learning rate.

Initially, the model converged well during the early stages when the learning rate was low. However, as the learning rate increased due to the warm-up settings, the training became unstable. By further reducing the learning rate, I was able to stabilize the training, and the model began converging as expected.

I hope this information helps others who might be facing similar challenges with smaller batch sizes. Thanks again for your support!

NVIDIA / NeMo

Fine-Tuning Fast Conformer CTC on Other Language #10112