Inconsistent config in vocabulary size breaks the eval.py

ZhengkunTian / OpenTransformer

A No-Recurrence Sequence-to-Sequence Model for Speech Recognition

MIT License

372 stars 66 forks source link

Inconsistent config in vocabulary size breaks the eval.py #45

Closed jiamingkong closed 3 years ago

jiamingkong commented 3 years ago

Hi, while I was testing some other training pipelines, I wanted to use aishell's dataset as a baseline. And I realized that in the two yaml files: egs/aishell/conf/conformer_baseline.yaml and eges/aishell/conf/transformer_lm.yaml, the vocabulary size was 4234 and 4233. This breaks the eval.py while I am trying to infer some audio using trained models.

batch_log_probs = batch_log_probs + self.lm_weight * batch_lm_log_probs
RuntimeError: The size of tensor a (4234) must match the size of tensor b (4233) at non-singleton dimension 1

I don't think this is any big issues while we would use our own data, but a proper check before running eval would be much appreciated.

jiamingkong commented 3 years ago

Also, I have implemented the MixSpeech augmentation using your codebase, currently I am actively testing it to see whether it brings about relative improvements as indicated in the paper. If so then I would love to push a PR for you. I really appreciate your work on Conformer.

Link to MixSpeech paper:

https://arxiv.org/abs/2102.12664

ZhengkunTian commented 3 years ago

Hi, while I was testing some other training pipelines, I wanted to use aishell's dataset as a baseline. And I realized that in the two yaml files: egs/aishell/conf/conformer_baseline.yaml and eges/aishell/conf/transformer_lm.yaml, the vocabulary size was 4234 and 4233. This breaks the eval.py while I am trying to infer some audio using trained models.
batch_log_probs = batch_log_probs + self.lm_weight * batch_lm_log_probs
RuntimeError: The size of tensor a (4234) must match the size of tensor b (4233) at non-singleton dimension 1
I don't think this is any big issues while we would use our own data, but a proper check before running eval would be much appreciated.

Sorry, it took me so long to notice this. I'll fix it immediately.

ZhengkunTian commented 3 years ago

Also, I have implemented the MixSpeech augmentation using your codebase, currently I am actively testing it to see whether it brings about relative improvements as indicated in the paper. If so then I would love to push a PR for you. I really appreciate your work on Conformer.

Link to MixSpeech paper:

https://arxiv.org/abs/2102.12664

It cannot be better if you would like to push a PR.