NVIDIA / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
672 stars 263 forks source link

fp32 ssd more than 20% slower than weiliu89 version #503

Closed cateweb closed 6 years ago

cateweb commented 6 years ago

Running ssd on one GeForce GTX 1080 Ti in Linux 4.4.0-122-generic on a AMD Ryzen 5 1600 Six-Core Processor, 8GB ram. CUDA 9.1.85 CUDNN 7.1.3. added nvcc options : -ftz=true -prec-div=false -prec-sqrt=false -arch=sm_61

during detection I used nvvp for both versions (nvidia and weiliu89) and got the following duration in ms for the layers called '1' (conv1_1, conv1_2 etc):

| weiliu89 | nvidia maxwell 128x64 | 1.31 | 1.33 add tensor | 3.07 | 4 activation | 1.94 | 1.94 maxwell winograd 128x128 | 9.88 | -  maxwell 128x64 | -   | 11.77 add tensor | 3.03 | 3.1 activation | 2.3 | 1.9 max pool | 1.5 | 1.3

and for the other convolutions (conv 2_1, conv2_2, etc)

weiliu89 | nvidia 5.28 | 5.2 6.61 | 10.68 3.44 | 5.03 6.06 | 10.92 6.04 | 10.61 3.7 | 5.06 10.05 | 10 10.15 | 10.9 3.1 | 3.06 3.1 | 3.06 3.1 | 3.06 6.4 | 8.07

May you please help in making the detection faster? Am I missing some configuration option/ flag? Thank you so much for your help Caterina

drnikolaev commented 6 years ago

Hi @cateweb could you please attach both logs?

cateweb commented 6 years ago

Hi @drnikolaev sending the nvvp sessions: ssd (weiliu) and cuda (nvidia) with wetransfer link batch size 16 300x300 px

exactly same images used

Let me know if these are of any help or you need different logs

Thanks Caterina

https://we.tl/KPlYxlBXz8

drnikolaev commented 6 years ago

@cateweb the link above is broken

cateweb commented 6 years ago

Hi, reloaded logs. the link will be valid for a week. https://we.tl/oy9AZvzHlR Thanks for your help Caterina

drnikolaev commented 6 years ago

@cateweb could you verify https://github.com/drnikolaev/caffe/tree/caffe-0.17 release candidate?

drnikolaev commented 6 years ago

@cateweb Please verify https://github.com/NVIDIA/caffe/tree/v0.17.1 release and reopen the issue if needed.