feiwang3311 / Lantern

BSD 3-Clause "New" or "Revised" License
167 stars 15 forks source link

Implement cuDNN CTC loss. #45

Closed dan-zheng closed 5 years ago

dan-zheng commented 5 years ago

Notes:

dan-zheng commented 5 years ago

Reference PyTorch program (requires PyTorch v1.0rc1):

import torch
import torch.nn as nn
import torch.nn.functional as F

cuda = torch.device('cuda')

input_length = 50
batch_size = 16
alphabet_size = 20

ctc_loss = nn.CTCLoss(reduction='none').to(cuda)

log_probs = torch.ones(input_length, batch_size, alphabet_size).log_softmax(2).detach().requires_grad_()
targets = torch.ones((batch_size), dtype=torch.long)

input_lengths = torch.full((batch_size,), input_length, dtype=torch.long)
target_lengths = torch.ones((batch_size,), dtype=torch.long)
loss = ctc_loss(log_probs, targets, input_lengths, target_lengths)

# This is the "probs" argument passed to `cudnnCTCLoss`.
print(log_probs.softmax(2))
print(loss)

# `probs`:
# tensor([[[0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          ...,
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500]]],
#        grad_fn=<SoftmaxBackward>)

# `loss` (not reduced)
# tensor([142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360,
#         142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360,
#         142.6360, 142.6360], grad_fn=<CtcLossBackward>)
dan-zheng commented 5 years ago

The PyTorch DeepSpeech model uses a different CTC loss implementation (warp-ctc). We should change the PyTorch model to use the new PyTorch library implementation of CTC loss for parity.