Closed xuzhm closed 9 years ago
when I use GPU to train mnist siamese. the loss is nan.
xzm@xzm:~/caffe-master6.25$ ./examples/siamese/train_mnist_siamese.sh ...... I0625 10:30:28.496675 24471 data_layer.cpp:118] Prefetch batch: 6 ms. I0625 10:30:28.496918 24471 data_layer.cpp:119] Read time: 1.01 ms. I0625 10:30:28.497056 24471 data_layer.cpp:120] Transform time: 4.228 ms. I0625 10:30:28.563247 24323 solver.cpp:343] Test net output #0: loss = nan (* 1 = nan loss) I0625 10:30:28.577770 24472 data_layer.cpp:118] Prefetch batch: 13 ms. I0625 10:30:28.581351 24472 data_layer.cpp:119] Read time: 0.789 ms. I0625 10:30:28.581390 24472 data_layer.cpp:120] Transform time: 11.978 ms. I0625 10:30:28.640379 24323 solver.cpp:214] Iteration 0, loss = nan I0625 10:30:28.640473 24323 solver.cpp:229] Train net output #0: loss = nan (* 1 = nan loss) I0625 10:30:28.640532 24323 solver.cpp:486] Iteration 0, lr = 0.01 I0625 10:30:28.744362 24473 data_layer.cpp:118] Prefetch batch: 3 ms. I0625 10:30:28.746536 24473 data_layer.cpp:119] Read time: 0.718 ms. I0625 10:30:28.746696 24473 data_layer.cpp:120] Transform time: 2.464 ms.
but when I worked on cpu, solver_mode: CPU ,the results are normal.
...... I0625 10:50:12.933557 30224 base_data_layer.cpp:69] Prefetch copied I0625 10:50:12.933603 30224 base_data_layer.cpp:78] CreatePrefetchThread I0625 10:50:12.939599 30657 data_layer.cpp:118] Prefetch batch: 5 ms. I0625 10:50:12.939817 30657 data_layer.cpp:119] Read time: 0.97 ms. I0625 10:50:12.939954 30657 data_layer.cpp:120] Transform time: 3.855 ms. I0625 10:50:13.659482 30224 solver.cpp:343] Test net output #0: loss = 0.19891 (* 1 = 0.19891 loss) I0625 10:50:13.659589 30224 base_data_layer.cpp:63] Thread joined I0625 10:50:13.659757 30224 base_data_layer.cpp:69] Prefetch copied I0625 10:50:13.659798 30224 base_data_layer.cpp:78] CreatePrefetchThread I0625 10:50:13.664013 30661 data_layer.cpp:118] Prefetch batch: 4 ms. I0625 10:50:13.664225 30661 data_layer.cpp:119] Read time: 0.779 ms. I0625 10:50:13.664360 30661 data_layer.cpp:120] Transform time: 2.497 ms. I0625 10:50:14.823714 30224 solver.cpp:214] Iteration 0, loss = 0.209695 I0625 10:50:14.823812 30224 solver.cpp:229] Train net output #0: loss = 0.209695 (* 1 = 0.209695 loss)
Does anyone know this problem? or this maybe a bug .....
If you can reproduce with the latest caffe-master, please file the bug report. https://github.com/BVLC/caffe/wiki/Reporting-Bugs-and-Other-Issues
when I use GPU to train mnist siamese. the loss is nan.
xzm@xzm:~/caffe-master6.25$ ./examples/siamese/train_mnist_siamese.sh ...... I0625 10:30:28.496675 24471 data_layer.cpp:118] Prefetch batch: 6 ms. I0625 10:30:28.496918 24471 data_layer.cpp:119] Read time: 1.01 ms. I0625 10:30:28.497056 24471 data_layer.cpp:120] Transform time: 4.228 ms. I0625 10:30:28.563247 24323 solver.cpp:343] Test net output #0: loss = nan (* 1 = nan loss) I0625 10:30:28.577770 24472 data_layer.cpp:118] Prefetch batch: 13 ms. I0625 10:30:28.581351 24472 data_layer.cpp:119] Read time: 0.789 ms. I0625 10:30:28.581390 24472 data_layer.cpp:120] Transform time: 11.978 ms. I0625 10:30:28.640379 24323 solver.cpp:214] Iteration 0, loss = nan I0625 10:30:28.640473 24323 solver.cpp:229] Train net output #0: loss = nan (* 1 = nan loss) I0625 10:30:28.640532 24323 solver.cpp:486] Iteration 0, lr = 0.01 I0625 10:30:28.744362 24473 data_layer.cpp:118] Prefetch batch: 3 ms. I0625 10:30:28.746536 24473 data_layer.cpp:119] Read time: 0.718 ms. I0625 10:30:28.746696 24473 data_layer.cpp:120] Transform time: 2.464 ms.
but when I worked on cpu, solver_mode: CPU ,the results are normal.
...... I0625 10:50:12.933557 30224 base_data_layer.cpp:69] Prefetch copied I0625 10:50:12.933603 30224 base_data_layer.cpp:78] CreatePrefetchThread I0625 10:50:12.939599 30657 data_layer.cpp:118] Prefetch batch: 5 ms. I0625 10:50:12.939817 30657 data_layer.cpp:119] Read time: 0.97 ms. I0625 10:50:12.939954 30657 data_layer.cpp:120] Transform time: 3.855 ms. I0625 10:50:13.659482 30224 solver.cpp:343] Test net output #0: loss = 0.19891 (* 1 = 0.19891 loss) I0625 10:50:13.659589 30224 base_data_layer.cpp:63] Thread joined I0625 10:50:13.659757 30224 base_data_layer.cpp:69] Prefetch copied I0625 10:50:13.659798 30224 base_data_layer.cpp:78] CreatePrefetchThread I0625 10:50:13.664013 30661 data_layer.cpp:118] Prefetch batch: 4 ms. I0625 10:50:13.664225 30661 data_layer.cpp:119] Read time: 0.779 ms. I0625 10:50:13.664360 30661 data_layer.cpp:120] Transform time: 2.497 ms. I0625 10:50:14.823714 30224 solver.cpp:214] Iteration 0, loss = 0.209695 I0625 10:50:14.823812 30224 solver.cpp:229] Train net output #0: loss = 0.209695 (* 1 = 0.209695 loss)
Does anyone know this problem? or this maybe a bug .....