TensorSpeech / TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
https://huylenguyen.com/asr
Apache License 2.0
941 stars 245 forks source link

WER for conformer update #124

Open gandroz opened 3 years ago

gandroz commented 3 years ago

Hi, I've just ended a training of a conformer using the sentencepiece featurizer on LibriSpeech over 50 epochs. Here are the results if you want to update your readme:

dataset_config:
    train_paths:
      - /data/datasets/LibriSpeech/train-clean-100/transcripts.tsv
      - /data/datasets/LibriSpeech/train-clean-360/transcripts.tsv
      - /data/datasets/LibriSpeech/train-other-500/transcripts.tsv
    eval_paths:
      - /data/datasets/LibriSpeech/dev-clean/transcripts.tsv
      - /data/datasets/LibriSpeech/dev-other/transcripts.tsv
    test_paths:
      - /data/datasets/LibriSpeech/test-clean/transcripts.tsv

Test results: G_WER = 5.22291565 G_CER = 1.9693377 B_WER = 5.19438553 B_CER = 1.95449066 BLM_WER = 100 BLM_CER = 100

The strange part is that I dot the same metrics on test-other dataset hmmm...

nglehuy commented 3 years ago

@changji-ustc I haven't supported it in keras training loop, I'm working on this.

gcervantes8 commented 2 years ago

@usimarit Have you been able to get a better WER with conformer? I see a lot of changes in the word piece branch.

With mixed precision and batch size 16 (effective batch size 96), the best Librispeech WER I've gotten is 6.4%.

With a medium conformer model with mixed precision and batch size 12 (effective batch size 72), the best WER I've gotten is 4.6%. (Warmup steps 40k, with only Librispeech) Using a transformer language model, I'm only able to lower the WER by 0.15% on test clean