SpeechColab / GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
Apache License 2.0
112 stars 5 forks source link

你好 能不能开源一下泰语最终cer:12.46的zipformer的模型配置参数? #9

Open alanshaoTT opened 22 hours ago

alanshaoTT commented 22 hours ago

根据gigaspeech2论文附录的模型配置我设置的训练config是这样的 --num-encoder-layers 2,2,4,5,4,2 \ --feedforward-dims 512,768,1536,2048,1536,768 \ --nhead 4,4,4,8,4,4 \ --encoder-dims 192,256,512,768,512,256 \ --attention-dims 192,256,512,768,512,256 \ --encoder-unmasked-dims 192,192,256,320,256,192 \ --zipformer-downsampling-factors 1,2,4,8,4,2 \ --cnn-module-kernels 31,31,15,15,15,31 \ --decoder-dim 512 \ --joiner-dim 512 但是这个维度与huggingface上开源的epoch-12.pt模型维度并不一样 我无法使用这个来load epoch-12.pt

yfyeung commented 22 hours ago

根据gigaspeech2论文附录的模型配置我设置的训练config是这样的 --num-encoder-layers 2,2,4,5,4,2 --feedforward-dims 512,768,1536,2048,1536,768 --nhead 4,4,4,8,4,4 --encoder-dims 192,256,512,768,512,256 --attention-dims 192,256,512,768,512,256 --encoder-unmasked-dims 192,192,256,320,256,192 --zipformer-downsampling-factors 1,2,4,8,4,2 --cnn-module-kernels 31,31,15,15,15,31 --decoder-dim 512 --joiner-dim 512 但是这个维度与huggingface上开源的epoch-12.pt模型维度并不一样 我无法使用这个来load epoch-12.pt

@alanshaoTT 您好,最后一个 iter 就是默认的 zipformer large 的参数,如果有具体报错方便贴出来吗

./zipformer/train.py \
  --world-size 8 \
  --max-duration 1000 \
  --num-epochs 999 \
  --start-epoch 13 \
  --use-fp16 1 \
  --exp-dir zipformer/exp \
  --num-encoder-layers 2,2,4,5,4,2 \
  --feedforward-dim 512,768,1536,2048,1536,768 \
  --encoder-dim 192,256,512,768,512,256 \
  --encoder-unmasked-dim 192,192,256,320,256,192