I try to reproduce the results with 100h finetune data. But my final results is not so good, wer of test-clean/test-other=8.2300 / 17.3509
training details
First iteration
100 clusters with mfcc feature, follow hubert_base_librispeech.yaml config, trained with 6 gpus
training loss and correct rate
250k step: loss_m_0=3.329, correct_m_0=0.364995, correct_u_0=0.0493537
400k step: loss_m_0=3.144, correct_m_0=0.38992, correct_u_0=0.0888545
it's obvious that 400k's result better than 250k step, should I choose 400k step checkpoint or continue training for more step?
Second iteration
500 clusters with 6 layer transformer feature(use the 250k step checkpoint as described in the paper), trained with 8 gpus
training loss and correct rate
250k step: loss_m_0=3.712, correct_m_0=0.382826, correct_u_0=0.635071
400k step: loss_m_0=3.422, correct_m_0=0.417456, correct_u_0=0.662054
is loss_m_0=3.422 good enough?
Finetune
follow base_10h.yaml, trained with 8 gpus, stop at 216000 updates(epoch 600)
Decode
viterbi decode, select the 100k step(280 epoch) checkpoint, wer result is test-clean/test-other=8.2300 / 17.3509
What have you tried?
What's your environment?
fairseq Version : 0.12.1
PyTorch Version : 1.10.0
OS (e.g., Linux):
How you installed fairseq (pip, source): source
Build command you used (if compiling from source):
What is your question?
I try to reproduce the results with 100h finetune data. But my final results is not so good, wer of test-clean/test-other=8.2300 / 17.3509
training details
First iteration
100 clusters with mfcc feature, follow hubert_base_librispeech.yaml config, trained with 6 gpus
training loss and correct rate
250k step: loss_m_0=3.329, correct_m_0=0.364995, correct_u_0=0.0493537 400k step: loss_m_0=3.144, correct_m_0=0.38992, correct_u_0=0.0888545 it's obvious that 400k's result better than 250k step, should I choose 400k step checkpoint or continue training for more step?
Second iteration
500 clusters with 6 layer transformer feature(use the 250k step checkpoint as described in the paper), trained with 8 gpus
training loss and correct rate
250k step: loss_m_0=3.712, correct_m_0=0.382826, correct_u_0=0.635071 400k step: loss_m_0=3.422, correct_m_0=0.417456, correct_u_0=0.662054 is loss_m_0=3.422 good enough?
Finetune
follow base_10h.yaml, trained with 8 gpus, stop at 216000 updates(epoch 600)
Decode
viterbi decode, select the 100k step(280 epoch) checkpoint, wer result is test-clean/test-other=8.2300 / 17.3509
What have you tried?
What's your environment?
pip
, source): source