asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers
https://asteroid-team.github.io/
MIT License
2.27k stars 423 forks source link

Extremely slow when evaluating #540

Open nobel861017 opened 3 years ago

nobel861017 commented 3 years ago

Hi, I am running stage 3 of egs/librimix/ConvTasNet/run.sh. I used --compute_wer 1 --eval_mode max to evaluate WER. However, it is running extremely slow.

2%|█▉                                                                                                  | 58/3000 [46:02<29:01:05, 35.51s/it

It takes more than one day to complete. I checked with nvidia-smi, and it was computing with GPU. However, I think only the separation process is running with GPU. I looked through the code eval.py and found out that numpy arrays are fed to the wer_tracker. So I think that for the asr part, it is evaluating in CPU mode. Is there any reason this can't be calculated with GPUs?

By the way, I see that eval.py is evaluating with the "Shinji Watanabe/librispeech_asr_train_asr_transformer_e18_raw_bpe_sp_valid.acc.best" asr model. Is it possible to switch to other kinds of asr models by modifying line 52?

Thanks

JusperLee commented 3 years ago

I don't think it's buggy because the mir_eval library is used when calculating separated performance metrics such as SNR. This library can have complex calculations and therefore can be time-consuming. If you only need WER metrics, you can ignore calculating separated performance metrics such as SNR.

nobel861017 commented 3 years ago

Thanks for your reply, but I still hope that the asr model forwards with GPU calculation.

nobel861017 commented 3 years ago

I left COMPUTE_METRICS as an empty list. However, it is still running so slow.

0%|                                                                                                     | 2/3000 [03:17<69:49:42, 83.85s/it]
nobel861017 commented 3 years ago

I changed line 231 in metrics.py into self.asr_model = Speech2Text(**d.download_and_unpack(model_name), device='cuda') and added wav = torch.from_numpy(wav).cuda() to the predict_hypothesis function between line 344 and 345. It is now able to calculate with GPU.

mpariente commented 3 years ago

Yes, ASR is very long on CPU. I don't remember why we didn't run it on GPU in the first place, maybe memory issues? Can't remember. Maybe @popcornell or @JorisCos would remember?

popcornell commented 3 years ago

At the time full decoding on GPU was not implemented with batched inputs. I recall the reason was something like this, we had problems running on GPU. Looks like now it runs smooth on GPU, thanks to the ESPNet gang and thank you for trying this. If you have time add an argument like use_gpu and submit a PR it would be great

nobel861017 commented 3 years ago

@popcornell @mpariente Thanks for your replies. I think the most urgent thing we need now is batch processing on asr. I think this requires a lot of modification in eval.py. I'm trying this. If you guys have any ideas or instructions on how to do this, please let me know.

nobel861017 commented 3 years ago

@popcornell I have sent a PR allowing asr to compute with GPU.