Open abcdvzz opened 5 years ago
It can run on multi GPU.
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -u train.py --batch_size=16 --pretrained_model=vgg_ilsvrc_16_fc_reduced
It can run on multi GPU.
export CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py --batch_size=16 --pretrained_model=vgg_ilsvrc_16_fc_reduced
Yeah, I tried it. It just cannot work. I saw this problem in another issue too. You didn't solve the problem but just CLOSED it.
@abcdvzz Sorry, I closed this issue, since I find your env is not right in https://github.com/PaddlePaddle/models/issues/1512 . We can solve it in #1512 or open this issue. I'll try my best to help you solve these problems.
@abcdvzz Sorry, I closed this issue, since I find your env is not right in #1512 . We can solve it in #1512 or open this issue. I'll try my best to help you solve these problems.
OK, that's ok. Pls open this issue. I've already solved the cuda issue. I only have two problems now. One is this multi gpu issue. Another one is validation issue:#1514 .
There is no problem on mutli-GPU on our machine. You need to give more detailed error info. And please help to close the sloved issue. Thanks!
When I use two gpus, I got this and that' all I got: Aborted at 1544374351 (unix time) try "date -d @1544374351" if you are using GNU date PC: @ 0x0 (unknown) SIGSEGV (@0x0) received by PID 32735 (TID 0x7f5aa0848700) from PID 0; stack trace: @ 0x7f5aa0440390 (unknown) @ 0x0 (unknown) When I use one gpu, It's ok.
Is there any detailed info ? Is your execution command is as follows? Note, need to set export CUDA_VISIBLE_DEVICES=0,1
, since the code here https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/face_detection/train.py#L113
export CUDA_VISIBLE_DEVICES=0,1
python -u train.py --batch_size=8 # according to the memory, maybe 6
Is there any detailed info ? Is your execution command is as follows? Note, need to set
export CUDA_VISIBLE_DEVICES=0,1
, since the code here https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/face_detection/train.py#L113export CUDA_VISIBLE_DEVICES=0,1 python -u train.py --batch_size=8 # according to the memory, maybe 6
That's all I got. I've already set the devices.
@abcdvzz Is your problem is sovled? Could you paster more info? I can't determine the problem from above error log. And better to give the enviroment. You also can test other examples on multi-GPU. You also can set export GLOG_v=3 export GLOG_logtostderr=1
to try find more detailed info and paste it out.
@abcdvzz Is your problem is sovled? Could you paster more info? I can't determine the problem from above error log. And better to give the enviroment. You also can test other examples on multi-GPU. You also can set
export GLOG_v=3 export GLOG_logtostderr=1
to try find more detailed info and paste it out.
Sorry, No. I'll try your opinion next week because I'm busy with my paper these days. Thank u for your patience.
When I trained the face detection model, I found that if there are more than one GPU visible, the training would encounter error: