ZJULearning / pixel_link

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018
MIT License
767 stars 254 forks source link

为什么程序会在CPU上跑? #138

Closed Pro-flynn closed 5 years ago

Pro-flynn commented 5 years ago

在我自己的数据上运程序时 为什么只会在CPU上运行呢 发现用了550%的CPU 而 gpu才用了150M 希望以前踩过这个坑的人 能够提示一下 谢谢!!

下面是我 /scripts/train.sh 的中设置

set -x set -e export CUDA_VISIBLE_DEVICES=0 IMG_PER_GPU=32

TRAIN_DIR=$/pixel_link_info

OLD_IFS="$IFS" IFS="," gpus=($CUDA_VISIBLE_DEVICES) IFS="$OLD_IFS" NUM_GPUS=${#gpus[@]}

BATCH_SIZE=expr $NUM_GPUS \* $IMG_PER_GPU

DATASET=thaiid DATASET_DIR=$/tmp

CUDA_VISIBLE_DEVICES=0 python train_pixel_link.py \ --train_dir=${TRAIN_DIR} \ --num_gpus=${NUM_GPUS} \ --learning_rate=1e-3\ --gpu_memory_fraction=-1 \ --train_image_width=512 \ --train_image_height=512 \ --batch_size=${BATCH_SIZE}\ --dataset_dir=${DATASET_DIR} \ --dataset_name=${DATASET} \ --dataset_split_name=train \ --max_number_of_steps=100\ --checkpoint_path=${CKPT_PATH} \ --using_moving_average=1 2>&1 | tee -a ${TRAIN_DIR}/log.log

jisheng047 commented 5 years ago

How can you solved this problem? Please help me

Pro-flynn commented 5 years ago

i had sloved this issue To slove it, you could check the whether cuda and cdunn are in your bashrc

---原始邮件--- 发件人: "JiSheng"notifications@github.com 发送时间: 2019年8月1日(星期四) 下午3:36 收件人: "ZJULearning/pixel_link"pixel_link@noreply.github.com; 抄送: "State change"state_change@noreply.github.com;"Shufflewave"295171504@qq.com; 主题: Re: [ZJULearning/pixel_link] 为什么程序会在CPU上跑? (#138)

How can you solved this problem? Please help me

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

jisheng047 commented 5 years ago

@Pro-xiaowen Are you using conda environment? In my case, after used fully 20GB GPU, the code also used fully CPU. And i don't know why?

Pro-flynn commented 5 years ago

spelling mistake . i suggect you check your cuda and cudnn environment. Since your experiments have used full gpu, maybe your condition are normal.

---原始邮件--- 发件人: "JiSheng"notifications@github.com 发送时间: 2019年8月1日(星期四) 晚上8:42 收件人: "ZJULearning/pixel_link"pixel_link@noreply.github.com; 抄送: "Mention"mention@noreply.github.com;"Shufflewave"295171504@qq.com; 主题: Re: [ZJULearning/pixel_link] 为什么程序会在CPU上跑? (#138)

@Pro-xiaowen Are you using conda environment? In my case, after used fully 20GB GPU, the code also used fully CPU. And i don't know why?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jisheng047 commented 5 years ago

@Pro-xiaowen the problem come from its also using fully my CPU too. Have you face this problem?

And I have another question, after about 3000 iterations, my loss is approximate at 0.5 - 0.6 (pretrained model is PixelLink VGG 2s) and its not drop down anymore. @@!. Have you trained successfully? Would you give me some advice, or idea to get out of this situation?

Pro-flynn commented 5 years ago

i do not konw the reason why the CPU was full in your experoments. About loss, you can use different learning rate setting, for example, expoentital_decay of learning rate . Different optimizer may output different performance. After above adjustment, the convergence loss is 0.4 around when i train pixellink using my own data in 40K interations. and the recall of test data is 97% icdar 2013 evalution criteria. you can try those adjustments.

---原始邮件--- 发件人: "JiSheng"notifications@github.com 发送时间: 2019年8月6日(星期二) 晚上7:26 收件人: "ZJULearning/pixel_link"pixel_link@noreply.github.com; 抄送: "Mention"mention@noreply.github.com;"Shufflewave"295171504@qq.com; 主题: Re: [ZJULearning/pixel_link] 为什么程序会在CPU上跑? (#138)

@Pro-xiaowen the problem come from its also using fully my CPU too. Have you face this problem?

And I have another question, after about 3000 iterations, my loss is approximate at 0.5 - 0.6 and its not drop down anymore. @@!. Have you trained successfully? Would you give me some advice, or idea to get out of this situation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jisheng047 commented 5 years ago

@Pro-xiaowen my loss is 0.4 around, but detect empty box and i don't know why @@!

jisheng047 commented 5 years ago

@Pro-xiaowen Can you share how you setup the dataset?. I think may be the problem come from my dataset.

Pro-flynn commented 5 years ago

i imitate the synthtext_to_tfrecords.py to set my tfrecords. you can have a try

---原始邮件--- 发件人: "JiSheng"notifications@github.com 发送时间: 2019年8月12日(星期一) 中午12:04 收件人: "ZJULearning/pixel_link"pixel_link@noreply.github.com; 抄送: "Mention"mention@noreply.github.com;"Shufflewave"295171504@qq.com; 主题: Re: [ZJULearning/pixel_link] 为什么程序会在CPU上跑? (#138)

@Pro-xiaowen Can you share how you setup the dataset?. I think may be the problem come from my dataset.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jisheng047 commented 5 years ago

@Pro-xiaowen Thanks you for sharing. I will try it.

Pro-flynn commented 5 years ago

you are welcome. About the empty prediction results, i suggect that you can train more epoches . In my experiment, this model can output effective boxes after 10-20k epoches.

---原始邮件--- 发件人: "JiSheng"notifications@github.com 发送时间: 2019年8月12日(星期一) 下午4:16 收件人: "ZJULearning/pixel_link"pixel_link@noreply.github.com; 抄送: "Mention"mention@noreply.github.com;"Shufflewave"295171504@qq.com; 主题: Re: [ZJULearning/pixel_link] 为什么程序会在CPU上跑? (#138)

@Pro-xiaowen Thanks you for sharing. I will try it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.