how to choose HuBERT pretrained checkpoint

What is your question?

I try to reproduce the results with 100h finetune data. But my final results is not so good, wer of test-clean/test-other=8.2300 / 17.3509

training details

First iteration

100 clusters with mfcc feature, follow hubert_base_librispeech.yaml config, trained with 6 gpus

training loss and correct rate

250k step: loss_m_0=3.329, correct_m_0=0.364995, correct_u_0=0.0493537 400k step: loss_m_0=3.144, correct_m_0=0.38992, correct_u_0=0.0888545 it's obvious that 400k's result better than 250k step, should I choose 400k step checkpoint or continue training for more step?

Second iteration

500 clusters with 6 layer transformer feature(use the 250k step checkpoint as described in the paper), trained with 8 gpus

training loss and correct rate

250k step: loss_m_0=3.712, correct_m_0=0.382826, correct_u_0=0.635071 400k step: loss_m_0=3.422, correct_m_0=0.417456, correct_u_0=0.662054 is loss_m_0=3.422 good enough?

Finetune

follow base_10h.yaml, trained with 8 gpus, stop at 216000 updates(epoch 600)

Decode

viterbi decode, select the 100k step(280 epoch) checkpoint, wer result is test-clean/test-other=8.2300 / 17.3509

What have you tried?

What's your environment?

fairseq Version : 0.12.1
PyTorch Version : 1.10.0
OS (e.g., Linux):
How you installed fairseq (pip, source): source
Build command you used (if compiling from source):
Python version: 3.8
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

facebookresearch / fairseq