SeanNaren / warp-ctc

Pytorch Bindings for warp-ctc
Apache License 2.0
756 stars 271 forks source link

zero loss anyway with warp-ctc successfully compiled #105

Open wabluy opened 5 years ago

wabluy commented 5 years ago

env: pytorch 0.4.0 ubuntu 14.04 miniconda3

test result: collected 4 items

test_gpu.py .... [100%]

================================= 4 passed in 3.98 seconds ================================== (py36) zhtang@hpclgpu:~/.../warp-ctc/pytorch_binding/tests$ python test_cpu.py ==================================== test session starts ==================================== platform linux -- Python 3.6.7, pytest-4.0.2, py-1.7.0, pluggy-0.8.0 rootdir: /home/.../warp-ctc/pytorch_binding, inifile: setup.cfg collected 4 items

test_cpu.py .... [100%]

================================= 4 passed in 0.09 seconds ==================================

but when I tried to run the codes on an4/libri, the loss is always 0 python train.py --rnn-type gru --hidden-size 800 --hidden-layers 5 --checkpoint --visdom --train-manifest data/libri_train_manifest.csv --val-manifest data/libri_val_manifest.csv --epochs 15 --num-workers $(nproc) --cuda --checkpoint --batch-size 10 --learning-anneal 1.1 Epoch: [1][1/45] Time 1.496 (1.496) Data 1.425 (1.425) Loss 0.1942 (0.1942) Epoch: [1][2/45] Time 0.028 (0.762) Data 0.004 (0.714) Loss 0.0000 (0.0971) Epoch: [1][3/45] Time 1.308 (0.944) Data 1.277 (0.902) Loss 0.0000 (0.0647) Epoch: [1][4/45] Time 0.021 (0.713) Data 0.002 (0.677) Loss 0.0000 (0.0485) Epoch: [1][5/45] Time 1.367 (0.844) Data 1.335 (0.808) Loss 0.0000 (0.0388) Epoch: [1][6/45] Time 0.026 (0.708) Data 0.003 (0.674) Loss 0.0000 (0.0324) Epoch: [1][7/45] Time 1.342 (0.798) Data 1.312 (0.765) Loss 0.0000 (0.0277) Epoch: [1][8/45] Time 0.070 (0.707) Data 0.044 (0.675) Loss 0.0076 (0.0252) Epoch: [1][9/45] Time 1.292 (0.772) Data 1.263 (0.740) Loss 0.0000 (0.0224) Epoch: [1][10/45] Time 0.093 (0.704) Data 0.057 (0.672) Loss 0.0000 (0.0202) Epoch: [1][11/45] Time 1.205 (0.750) Data 1.179 (0.718) Loss 0.0000 (0.0183) Epoch: [1][12/45] Time 0.078 (0.694) Data 0.050 (0.663) Loss 0.0000 (0.0168) Epoch: [1][13/45] Time 1.382 (0.747) Data 1.354 (0.716) Loss 0.0075 (0.0161) Epoch: [1][14/45] Time 0.116 (0.702) Data 0.086 (0.671) Loss 0.0075 (0.0155) Epoch: [1][15/45] Time 1.218 (0.736) Data 1.194 (0.706) Loss 0.0075 (0.0150) Epoch: [1][16/45] Time 0.139 (0.699) Data 0.112 (0.668) Loss 0.0076 (0.0145) Epoch: [1][17/45] Time 1.112 (0.723) Data 1.086 (0.693) Loss 0.0076 (0.0141) Epoch: [1][18/45] Time 0.123 (0.690) Data 0.093 (0.660) Loss 0.0075 (0.0137) Epoch: [1][19/45] Time 1.128 (0.713) Data 1.098 (0.683) Loss 0.0075 (0.0134) Epoch: [1][20/45] Time 0.111 (0.683) Data 0.088 (0.653) Loss 0.0075 (0.0131) Epoch: [1][21/45] Time 1.150 (0.705) Data 1.128 (0.676) Loss 0.0075 (0.0128) Epoch: [1][22/45] Time 0.144 (0.679) Data 0.113 (0.650) Loss 0.0075 (0.0126) Epoch: [1][23/45] Time 1.361 (0.709) Data 1.335 (0.680) Loss 0.0076 (0.0124) Epoch: [1][24/45] Time 0.157 (0.686) Data 0.133 (0.657) Loss 0.0075 (0.0122) Epoch: [1][25/45] Time 1.290 (0.710) Data 1.267 (0.681) Loss 0.0075 (0.0120) Epoch: [1][26/45] Time 0.132 (0.688) Data 0.114 (0.660) Loss 0.0075 (0.0118) Epoch: [1][27/45] Time 1.296 (0.710) Data 1.269 (0.682) Loss 0.0075 (0.0117) Epoch: [1][28/45] Time 0.130 (0.690) Data 0.111 (0.662) Loss 0.0075 (0.0115) Epoch: [1][29/45] Time 1.360 (0.713) Data 1.336 (0.685) Loss 0.0076 (0.0114) Epoch: [1][30/45] Time 0.090 (0.692) Data 0.065 (0.664) Loss 0.0076 (0.0112)

wuliebucha commented 5 years ago

I have seen same problem. Gpu loss is zero, cpu loss is normal, but i can't know why.

eastonYi commented 5 years ago

I have the same issue when running test_gpu.py:

CPU_cost: 2.462858
GPU_cost: 0.000000
tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]], device='cuda:0')

Does anyone know how to fix it? Same question as https://github.com/SeanNaren/warp-ctc/issues/102