Open zh794390558 opened 2 years ago
but you said that warp_transducer cpu grad is same to optimized_transducer and torchaudio
Where did you find that?
The README.md says:
Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.
It only says alpha
and beta
, not grad
.
It borrows the methods of computing alpha and beta from warp-transducer. Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.
However, warp-transducer produces different gradients for CPU and CUDA when using the same input. See https://github.com/HawkAaron/warp-transducer/issues/93. I also created a [colab notebook](https://colab.research.google.com/drive/1vMkH8LmiCCOiCo4KTTEcv-NU8_OGn0ie?usp=sharing) to reproduce that issue.
This project produces consistent gradient on CPU and CUDA for the same input, just like what torchaudio is doing. (We borrow the gradient computation formula from torchaudio).
Sorry, I got it wrong. So for the known conclusion, trochaudio
is aligned with optimized_transducer
. The warp_transducer gpu
will has the same grad result as optimized_transducer
, beside warp_transducer cpu
since the gradient formula is not right?
why cpu and gpu loss for warp_transducer
is not equal, in the codelab?
I think the above wrong conclusion is got from here.
The warp_transducer gpu will has the same grad result as optimized_transducer
No. You can find the conclusions in the colab (listed in the README.md).
why cpu and gpu loss for warp_transducer is not equal, in the codelab?
Please ask the author of warp-transducer.
用的codalab的case, espnet的rnnt,结果是一致的。是我使用有问题吗?
我刚刚又跑了一遍上面的 colab notebook, 发现复现不了以前的结果了。不知道哪里出问题了。
所以这个问题还有吗?可能是cuda版本问题?
BTW, 能把colab里的torch版本固定吗? 上次跑了下,发现无法跑通。
codelab
readme.md 中,给的 colab notebook, 里面使用了 Tesla K80
gpu.
我今天试的 colab notebook, 被分配到了 Tesla T4
, 所以测试环境不一样了。
如果你能在 Tesla K80
gpu 中复现的话,那么,这个问题,就是存在的。不能的话,那么应该就不存在了。
(我稍后在本地的 v100 gpu 中,看能不能复现).
BTW, 能把colab里的torch版本固定吗? 上次跑了下,发现无法跑通。
可以的。
The formular for gradient is below in
warprnnt_numba
andwarp_transducer cpu
:that is not same to
torchaudio
,optimized_transducer
and ,warp_transducer gpu
, but you said thatwarp_transducer cpu
grad is same tooptimized_transducer
andtorchaudio
, how that is achieved?