SeanNaren / warp-ctc

Pytorch Bindings for warp-ctc
Apache License 2.0
756 stars 271 forks source link

gpu_ctc calculates 0 loss #102

Open ckonniefan opened 5 years ago

ckonniefan commented 5 years ago

Test_gpu and Test_cpu are all passed. But when I use gpu_ctc to calculate cost the output is 0, while cpu_ctc output 2.46. My pytorch version is 1.0, python 2.7, cuda 10.0.

  1. put probs on cuda
    
    import warpctc_pytorch as warpctc
    from warpctc_pytorch import CTCLoss

import torch

ctc_loss = CTCLoss()

expected shape of seqLength x batchSize x alphabet_size

probs = torch.FloatTensor([[[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]]]).transpose(0, 1).contiguous().cuda() labels = torch.IntTensor([1, 2]) label_sizes = torch.IntTensor([2]) probs_sizes = torch.IntTensor([2]) probs.requiresgrad(True) # tells autograd to compute gradients for probs cost = ctc_loss(probs, labels, probs_sizes, label_sizes)

output : cost = tensor([0.])


2. probs on cpu
```python
import warpctc_pytorch as warpctc
from warpctc_pytorch import CTCLoss

import torch

ctc_loss = CTCLoss()
# expected shape of seqLength x batchSize x alphabet_size
probs = torch.FloatTensor([[[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]]]).transpose(0, 1).contiguous()
labels = torch.IntTensor([1, 2])
label_sizes = torch.IntTensor([2])
probs_sizes = torch.IntTensor([2])
probs.requires_grad_(True)  # tells autograd to compute gradients for probs
cost = ctc_loss(probs, labels, probs_sizes, label_sizes)
output: cost = tensor([2.4629])
wuliebucha commented 5 years ago

@ ckonniefan I have seen same problem. Have you solved ?

eastonYi commented 5 years ago

@ckonniefan @wuliebucha has anyone fixed it?

khassanoff commented 5 years ago

I am having the same problem. However, in my case, it happens randomly :( @ckonniefan @wuliebucha did you solve this problem?

dzubke commented 4 years ago

I'm having the same issue. Has anyone been able to look into this? @SeanNaren @ckonniefan @wuliebucha

My environment is:

The output from tests/test_gpu.py is below: libs/warp-ctc/pytorch_binding/tests/test_gpu.py CPU_cost: 2.462858 GPU_cost: 0.000000 tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]], device='cuda:0') .CPU_cost: 6.016518 GPU_cost: 0.000000 tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]], device='cuda:0') .CPU_cost: 199.966187 GPU_cost: 0.000000 tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]], device='cuda:0') .CPU_cost: 6.416517 GPU_cost: 0.000000 tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]], device='cuda:0')

dzubke commented 4 years ago

I found that I didn't encounter this error when using a Nvidia tesla k80 or P100 GPU. The test_gpu.py runs as expected with the CPU and GPU costs being nearly identical and the warp-ctc pytorch bindings work within the model I'm using the bindings in.

However, I get this error when using the GPU's below:

I have no idea why it is different for different GPU's. My setup is described in my previous comment.

lqyiii commented 4 years ago

i also encountered this problem, using gpu v100. a little trick was used to avoid it, that is using gpu to train model except to calculate model loss, before calculating model loss by gpu_ctc, i converted the probs to cpu.