Open nobel861017 opened 3 years ago
I don't think it's buggy because the mir_eval library is used when calculating separated performance metrics such as SNR. This library can have complex calculations and therefore can be time-consuming. If you only need WER metrics, you can ignore calculating separated performance metrics such as SNR.
Thanks for your reply, but I still hope that the asr model forwards with GPU calculation.
I left COMPUTE_METRICS as an empty list. However, it is still running so slow.
0%| | 2/3000 [03:17<69:49:42, 83.85s/it]
I changed line 231 in metrics.py into
self.asr_model = Speech2Text(**d.download_and_unpack(model_name), device='cuda')
and added wav = torch.from_numpy(wav).cuda()
to the predict_hypothesis
function between line 344 and 345.
It is now able to calculate with GPU.
Yes, ASR is very long on CPU. I don't remember why we didn't run it on GPU in the first place, maybe memory issues? Can't remember. Maybe @popcornell or @JorisCos would remember?
At the time full decoding on GPU was not implemented with batched inputs. I recall the reason was something like this, we had problems running on GPU. Looks like now it runs smooth on GPU, thanks to the ESPNet gang and thank you for trying this. If you have time add an argument like use_gpu and submit a PR it would be great
@popcornell @mpariente Thanks for your replies. I think the most urgent thing we need now is batch processing on asr. I think this requires a lot of modification in eval.py. I'm trying this. If you guys have any ideas or instructions on how to do this, please let me know.
@popcornell I have sent a PR allowing asr to compute with GPU.
Hi, I am running stage 3 of egs/librimix/ConvTasNet/run.sh. I used
--compute_wer 1 --eval_mode max
to evaluate WER. However, it is running extremely slow.It takes more than one day to complete. I checked with
nvidia-smi
, and it was computing with GPU. However, I think only the separation process is running with GPU. I looked through the code eval.py and found out that numpy arrays are fed to the wer_tracker. So I think that for the asr part, it is evaluating in CPU mode. Is there any reason this can't be calculated with GPUs?By the way, I see that eval.py is evaluating with the "Shinji Watanabe/librispeech_asr_train_asr_transformer_e18_raw_bpe_sp_valid.acc.best" asr model. Is it possible to switch to other kinds of asr models by modifying line 52?
Thanks