Open Royzon opened 6 years ago
I guess it is prefetch problem
after i changed what you said, it didn't work well yet and stuck at below: "I1105 18:19:40.103994 17897 solver.cpp:208] Creating test net (#0) specified by test_net file: models/yolov3_mobilenetv1/yolov3_mobilenetv1_test.prototxt"
Sorry , I don't have environment to test multi gpu training now .
I will keep it in first issue
you can install nccl such as $ git clone https://github.com/NVIDIA/nccl.git $ cd nccl $ sudo make install -j8 and modified cmakelist just like “caffe_option(USE_NCCL "Build Caffe with NCCL library support" ON)” and compile again project by cmake
@wzjiang ,I have changed Cmakelist file and installed NCCL already, and compiled successfully with the info: NCCL-ON. What puzzles me is that the caffe from BVLC can run multiple GPUs, but this is not feasible.
I have no idea about that. Now I meet the same problem. I compiled successfully and run without any error but always stay at a certain step.
How can I run syustem without GPU , I want to ran system only with CPU , what command could I used
@macqueen09 The simplest way is set CPU_ONLY ON , remember to delete cmakecache and remake
And another way is
@eric612 I meet same problem. how to fix it? I set PREFETCH_COUNT to 3 and build with nccl without any error. but it stay as follows: my solver is
I found the issue as caffe ssd , maybe it is a pre-processing problem , unfortunately , I don't have environment to test.
thank you for your reply,I will try.
I have update a new version to solve prefetch problems, please try again
@eric612 thanks very much. but I still meet the same question as before.
train_yolov3_lite.sh:
#!/bin/bash
LOG=log/train-`date +%Y-%m-%d-%H-%M-%S`.log
../build/tools/caffe train --solver ./mobilenet_yolov3_lite_solver.prototxt --gpu=0,1 2>&1 | tee $LOG
mobilenet_yolov3_lite_solver.prototxt:
train_net: "mobilenet_yolov3_lite_train.prototxt"
test_net: "mobilenet_yolov3_lite_test.prototxt"
test_iter: 4952
test_interval: 1000
base_lr: 0.001
display: 10
max_iter: 50000
lr_policy: "multistep"
gamma: 0.5
weight_decay: 0.00005
snapshot: 1000
snapshot_prefix: "models/"
solver_mode: GPU
debug_info: false
snapshot_after_train: true
test_initialization: false
average_loss: 10
stepvalue: 10000
stepvalue: 20000
stepvalue: 30000
stepvalue: 40000
iter_size: 9
type: "RMSProp"
eval_type: "detection"
ap_version: "11point"
show_per_class_result: true
if I comment the test_net: "mobilenet_yolov3_lite_test.prototxt" test_iter: 4952 test_interval: 1000
, it runs well.
I meet the same issue. however the current version code cannot find PREFETCH_COUNT can you tell me the history version that has that code? @linquanxu
Multi-GPU stuck, can only be used in a single