1ytic / warp-rnnt

CUDA-Warp RNN-Transducer
MIT License
211 stars 41 forks source link

warning that forward/backward mismatch #24

Open maxwellzh opened 2 years ago

maxwellzh commented 2 years ago

The warning messages occasionally thrown out during training,

...
WARNING: sample 10 [81, 25] has a forward/backward mismatch -0.000083 / -0.000083
...
WARNING: sample 11 [62, 28] has a forward/backward mismatch -0.000188 / -0.000188

The source code makes the judgement of whether abs(a-b)/abs(max(a,b)) > 0.001. I'm sorry that I have difficulty reading the core_gather.cu. Could you explain more details about the function kernel_fill_costs() and alphas, betas?

1ytic commented 2 years ago

These variables comes from the classical forward/backward algorithm. alphas and betas must be equal with the small measurement errors. For some reason the values looks very small. Please check that you provide the right input data.

maxwellzh commented 2 years ago

If this is error related to the input data, it should repeat every epoch in training, but at the beginning, no warning is thrown. And as you can see, all the warnings are generated with small values, so I wonder whether if there is something that leads to under flow computation.

1ytic commented 2 years ago

I can't remember from my practice that these values was so small. Maybe we should add additional check not only for ratio, but also for abs value as well. Fell free to change this condition check and recompile the package from the source.