SeanNaren / warp-ctc

Pytorch Bindings for warp-ctc
Apache License 2.0
757 stars 271 forks source link

Change "grads" tensor to be pre-allocated #112

Closed PCerles closed 5 years ago

PCerles commented 5 years ago

Currently, the _CTC.apply function allocates a "grads" tensor the same size as "acts" with every call of the function. In situations with large label size (Eastern languages, gram/word-level labels), this can cause a big slowdown, as we are consistently allocating and copying over data to the GPU for every call to CTC loss. Since this function is run in a training loop and we know the max batch size, max sequence length, and label length beforehand, we can allocate this gradient tensor in the CTCLoss function. When we call forward we just zero the tensor and slice into the tensor for the current input's sequence length and batch size. This can cause up to a ~10x speed up in CTCLoss.

Possible complications: