csukuangfj / transducer-loss-benchmarking

Other
64 stars 10 forks source link

Add benchmark for github.com/awni/transducer #13

Open ronggong opened 2 years ago

ronggong commented 2 years ago

github.com/awni/transducer implementation seems use much less memory and run 20x faster than torchaudio RNNT. Is that possible to add this to the benchmarking? Thanks.

csukuangfj commented 2 years ago

Sure, I will add that.

One thing to note: It has a specific constraint or requirement: the jonit network consists of only an adder. It is not possible to add nn.Linear or any activation layers to it.

Also, the authors don't provide any WERs for the models trained with it.

ronggong commented 2 years ago

@csukuangfj I see, interesting. The input logit is (B, T, V) shape in their implementation, it looks like a CTC logit shape. I wonder how he made use of the prediction network output?

csukuangfj commented 2 years ago

@ronggong

Actually, what github.com/awni/transducer implements is just a special case of k2's RNN-T loss, which is called k2.rnnt_loss_simple, see https://github.com/k2-fsa/k2/blob/master/k2/python/k2/rnnt_loss.py#L196

ronggong commented 2 years ago

How does the simple rnnt performs compared with a normal joiner with activations and linear transform? Is it much worse?

csukuangfj commented 2 years ago

How does the simple rnnt performs compared with a normal joiner with activations and linear transform? Is it much worse?

I don't have such results available. You can either ask awni or do some experiments by yourself.