Closed yuntang closed 8 months ago
Thank you very much for raising this! We have fixed it in this PR - https://github.com/NVIDIA/NeMo/pull/8587 It occurred due to a large refactor and unification of metrics in ASR to make it simpler to extend in the long run.
The patch will be there in the next NeMo release, and we have added a release note in the 1.23 release page https://github.com/NVIDIA/NeMo/releases/tag/v1.23.0 so that users are aware and can utilize correct metrics during evaluation by using the speech to text eval script (or disabling fused batch explicitly)
Fixed via #8587
Describe the bug As shown in (https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/metrics/wer.py#L349-L350), the new scores and words will be assigned to the object and the pervious scores and words are dropped. 1) we might rename this function to WER.set or 2) we might update the code as
The current code could lead to WER report inconsistent during training and inference if we use fuse_loss_wer in the Transducer model training, i.e., model.joint.fuse_loss_wer=True and model.joint. fused_batch_size > 1. In this setting, only the last sub-mini-batch WER is accumulated during validation stage.