Inference time, after loading the weights, is slower than ./build/tools/caffe time

NVIDIA / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

672 stars 263 forks source link

Inference time, after loading the weights, is slower than ./build/tools/caffe time #573

Open jazzseow opened 5 years ago

jazzseow commented 5 years ago

When I ran ./build/tools/caffe time, I got I0716 10:14:52.669873 18718 caffe.cpp:656] Average Forward pass: 11.4608 ms.

When I ran ./build/examples/ssd/ssd_detect.bin and time the forward function, I got timing like these time: 23.809 ms time: 22.517 ms time: 23.631 ms time: 22.49 ms time: 23.481 ms time: 21.887 ms time: 23.696 ms time: 22.322 ms time: 23.026 ms time: 23.716 ms time: 22.506 ms time: 22.152 ms time: 23.222 ms time: 21.964 ms time: 23.871 ms time: 22.715 ms time: 23.888 ms time: 22.232 ms time: 23.315 ms

These are my codes https://drive.google.com/drive/folders/1cAhF9wBNjBpO9Ykoh80Sv5eBqBZJwDYQ?usp=sharing

drnikolaev commented 5 years ago

@jazzseow could you upload the commands, their outputs and prototxt files used?

jazzseow commented 5 years ago

@drnikolaev Thank you for your reply. I have uploaded the required files to https://drive.google.com/open?id=1cAhF9wBNjBpO9Ykoh80Sv5eBqBZJwDYQ

Also, I have uploaded the modifications required to run RefineDet model, under include/ and src/ folders

drnikolaev commented 5 years ago

A-ha, seems like a bug: when you run caffe time convolution algos get optimized like this:

I0719 11:48:43.798629 11106 cudnn_conv_layer.cpp:857] [n0.d0.r0] Conv Algos (F,BD,BF): 'conv3_1' with space 0.08G 63/1 6 1 0    (avail 9.72G, req 0.08G)    t: 0 0 0.6
I0719 11:48:44.033376 11106 cudnn_conv_layer.cpp:857] [n0.d0.r0] Conv Algos (F,BD,BF): 'conv3_2' with space 0.09G 233/1 6 1 5   (avail 9.7G, req 0.09G) t: 0 0 1.11
I0719 11:48:44.282755 11106 cudnn_conv_layer.cpp:857] [n0.d0.r0] Conv Algos (F,BD,BF): 'conv3_3' with space 0.09G 233/1 6 1 5   (avail 9.68G, req 0.09G)    t: 0 0 1.15

But caffe test doesn't. Could you try to comment out lines https://github.com/NVIDIA/caffe/blob/caffe-0.17/src/caffe/layers/cudnn_conv_layer.cpp#L450 https://github.com/NVIDIA/caffe/blob/caffe-0.17/src/caffe/layers/cudnn_conv_layer.cpp#L456 and retry caffe test?

jazzseow commented 5 years ago

@drnikolaev So I tried this

if (!use_modest_workspace()) {
    // if (this->phase_ == TRAIN) {
    // Now taking the rest for running FindEx calls
    // We'll release what's possible in BW pass
    LOG(INFO); // line 453
    AllocateFindExWorkspace();
    // Also used by Test Net but based on shared space taken by Train:
    LOG(INFO); // line 456
    FindExConvAlgo(bottom, top);
    LOG(INFO); // line 458
    // }
    use_algo_seeker_ = false;
}

caffe time works fine. But it resulted in Segmentation Fault() on FindExConvAlgo(bottom, top); when i run ssd_detect.

I0719 16:18:22.243350 23616 cudnn_conv_layer.cpp:453] 
I0719 16:18:22.248224 23616 cudnn_conv_layer.cpp:456] 
Segmentation fault (core dumped)