NVIDIA / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
672 stars 263 forks source link

CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 #567

Closed whria78 closed 5 years ago

whria78 commented 5 years ago

Hello, Thank you for the maintaining the nvcaffe. I have a problem when I use nvcaffe with SE-Net models.

CPU usage 100% when finetuning SE-Net (or all SE-NeXt) with nvcaffe 0.17.3

There was no problem when I fine-tuning SE-Net with old nvcaffe 0.17.2. However, If I trained with the new nvcaffe 0.17.3, the CPU utilizations of all cores burn to 100% after initial test 0,1,2 iterations. I tested on 2 systems and I got the same results (Intel skylake, Ubuntu 16.04 , CUDA 10.1 , cudnn 10.1 , driver=nvidia-driver-418)

For this problem, I could not train SENet, SE-ReNext-50, SE-ResNext-100. However, there was no problem in training VGG model with the new nvcaffe 0.17.3

drnikolaev commented 5 years ago

Hi @whria78 could you attach complete log here please?

whria78 commented 5 years ago

Thank you for the reply.

I attached log and trainval.prototxt

senext50_FP16_train.log

senext50_FP16.zip

drnikolaev commented 5 years ago

@whria78 thanks.

I0618 10:02:14.258036 32105 net.cpp:1135] Ignoring source layer classifier
I0618 10:02:14.258038 32105 net.cpp:1135] Ignoring source layer classifier_classifier_0_split

Is this your custom layer? CPU only? If so, you might need to consider to add GPU implementaion to prevent CPU overload.

whria78 commented 5 years ago

Thank you @drnikolaev

I tried to fine-tuning SE-ResNeXt-50. I changed the last classify layer as followings,

From (https://github.com/hujie-frank/SENet ; class = 1000)

layer {
  name: "classifier"
  type: "InnerProduct"
  bottom: "pool5/7x7_s1"
  top: "classifier"
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "classifier"
  top: "prob"
}

To (class = 178)

layer {
  name: "whria_classifier"
  type: "InnerProduct"
  bottom: "pool5/7x7_s1"
  top: "whria_classifier"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 178
  }
}
layer {
    bottom: "whria_classifier"
    bottom: "label"
    top: "loss"
    name: "loss"
    type: "SoftmaxWithLoss"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "whria_classifier"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}

To my knowledge, this modification is commonly used for fine-tuning.

I think it does not make all 8 CPU cores to 100%.

This phenomenon does not occur if I changed to use the old version NVCaffe 0.17.2.

Another fact is that I had no such a problem when I finetuned VGG-19 with the new NVCaffe 0.17.3.

whria78 commented 5 years ago

As I have observed. the cpu overload problem occured around "batch_transformer.cpp:51] Started BatchTransformer thread 32123"

Before "batch_transformer.cpp:51] Started BatchTransformer thread 32123", there is "Waiting for datum".

I guess at that stage, the CPU overload already started, which result in "waiting for datum".

The CPU overload problem occur before the Iteration 1.

*I0618 10:02:14.725428 32119 internal_thread.cpp:78] Started internal thread 32119 on device 0, rank 0 I0618 10:02:14.725850 32121 internal_thread.cpp:78] Started internal thread 32121 on device 0, rank 0 I0618 10:02:14.726251 32120 internal_thread.cpp:78] Started internal thread 32120 on device 0, rank 0 I0618 10:02:14.731768 32122 internal_thread.cpp:78] Started internal thread 32122 on device 0, rank 0 I0618 10:02:14.733804 32121 blocking_queue.cpp:40] Waiting for datum I0618 10:02:14.739902 32118 common.cpp:544] {0} NVML succeeded to set CPU affinity I0618 10:02:14.741881 32123 common.cpp:544] {0} NVML succeeded to set CPU affinity I0618 10:02:14.741904 32123 batch_transformer.cpp:51] Started BatchTransformer thread 32123 I0618 10:02:26.103688 32105 solver.cpp:342] [0.0] Iteration 1 (11.3895 s), loss = 5.17969 I0618 10:02:26.103915 32105 solver.cpp:358] [0.0] Train net output #0: loss = 5.17969 ( 1 = 5.17969 loss) I0618 10:02:26.103986 32105 sgd_solver.cpp:180] [0.0] Iteration 0, lr = 0.001, m = 0.9, lrm = 0.01, wd = 1e-05, gs = 1**

drnikolaev commented 5 years ago

"waiting for datum" means that data reader (CPU-based) delivers data not fast enough for the solver. Therefore, it uses CPU up to 100%