DALI results is different than pytorch data loader for ASR Models

agemagician commented 2 years ago

Hello,

We are currently testing DALI with Nvidia Nemo, and we get different results when using DALI compared to using the normal PyTorch data loader.

We have created three Colab examples to reproduce our issue: https://colab.research.google.com/drive/1Rz42EeQVDHhTso3kWvBp1wQc6x1FxXgY?usp=sharing https://colab.research.google.com/drive/1Bk7fUBnTuIvC7JvDPau0lsuBjCmeOYEx?usp=sharing https://colab.research.google.com/drive/1eQe7EjLwjGsO7Dktaj2p9uvb91YYpBQb?usp=sharing

We have open an issue on Nvidia Nemo: https://github.com/NVIDIA/NeMo/issues/2853

However, since this problem is related to DALI, it will be great if you can give us and NeMo team some hints about what could went wrong.

The code for using DALI could be found here: https://github.com/NVIDIA/NeMo/blob/d04c7e9b4ea7055e7fd9777c6e112b8a42c16130/nemo/collections/asr/data/audio_to_text_dali.py It should produce the same results as the code here: https://github.com/NVIDIA/NeMo/blob/d04c7e9b4ea7055e7fd9777c6e112b8a42c16130/nemo/collections/asr/data/audio_to_text.py#L762

Any hint will be highly appreciated.

Thanks.

JanuszL commented 2 years ago

Hi @agemagician,

Thank you for reporting the issue. Let us check it and get back to you soon.

jantonguirao commented 2 years ago

Thank you @agemagician for reporting the issue and for the easy-to-follow reproduction steps! I can confirm this is a bug on the NeMo side. Please see https://github.com/NVIDIA/NeMo/issues/2853 for details

agemagician commented 2 years ago

Thanks a lot, @jantonguirao, for your help and your effort.

I have updated the NeMo issue because there is still a slight difference between DALI and PyTorch data loaders' results.

Thanks again.

NVIDIA / DALI

DALI results is different than pytorch data loader for ASR Models #3356