Open Amanda-Barbara opened 5 years ago
Try to change the batch size in test prototxt
I meet the same problem. Could you tell me the size you changed?
I think the batch size = 1 can't not be split to multi-gpu training in test phase , so you can close the test phase and start training .
I tries the batch size = 1, but it also stuck.
I find the situation that your project is forked from caffe-ssd which is also stuck in multi-gpu. But I tried the caffe source code from BVLC, it could be run using multi-gpu with NCCL. And I tried the caffe writted by yjxiong, which is wrote with openmpi to do the multi-gpu work.
I'll try to use the BVLC code to rewrite the caffe-mobilenet-yolo. Could you help me if I have some problems?
Unfortunately , I don't have multi-gpu computer or environment :(
So , it is really hard for me , maybe you can see this issue https://github.com/eric612/MobileNet-YOLO/issues/28
@solomon-ma @Amanda-Barbara , I also encountered the same problem, I changed the batch_size to 4 (the same number as my gpus), still stopped at "Creating test net (#0) specified by test_net file"; Have you solved this problem? If you can solve it, can you tell me?
Hi Guys, I also meet the same problem, even I use the example ./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt --gpu 0,1
I think the batch size = 1 can't not be split to multi-gpu training in test phase , so you can close the test phase and start training .
Thanks for your great work! Yes, you are right, training on multi gpus is working after closing testing phase but still confused , why it is stopped in testing phase, even I set the testing batch size 4, (I am using 2 gpus)
Refer this issue https://github.com/eric612/MobileNet-YOLO/issues/198
I think the batch size = 1 can't not be split to multi-gpu training in test phase , so you can close the test phase and start training .
Thanks for your great work! Yes, you are right, training on multi gpus is working after closing testing phase but still confused , why it is stopped in testing phase, even I set the testing batch size 4, (I am using 2 gpus)
hi,could you tell me the close test phase step? thanks.
hi, I have tried your newst version of MobileNet-YOLO to train with multi gpu, but the gpus still seized up and stopped the step like this: I0724 04:24:32.355298 8003 solver.cpp:203] Creating test net (#0) specified by test_net file: models/yolov3/head_mobilenet_yolov3_lite_test.prototxt can you give any idea? thanks @eric612