Closed cateweb closed 6 years ago
Hi @cateweb could you please attach both logs?
Hi @drnikolaev sending the nvvp sessions: ssd (weiliu) and cuda (nvidia) with wetransfer link batch size 16 300x300 px
exactly same images used
Let me know if these are of any help or you need different logs
Thanks Caterina
@cateweb the link above is broken
Hi, reloaded logs. the link will be valid for a week. https://we.tl/oy9AZvzHlR Thanks for your help Caterina
@cateweb could you verify https://github.com/drnikolaev/caffe/tree/caffe-0.17 release candidate?
@cateweb Please verify https://github.com/NVIDIA/caffe/tree/v0.17.1 release and reopen the issue if needed.
Running ssd on one GeForce GTX 1080 Ti in Linux 4.4.0-122-generic on a AMD Ryzen 5 1600 Six-Core Processor, 8GB ram. CUDA 9.1.85 CUDNN 7.1.3. added nvcc options : -ftz=true -prec-div=false -prec-sqrt=false -arch=sm_61
during detection I used nvvp for both versions (nvidia and weiliu89) and got the following duration in ms for the layers called '1' (conv1_1, conv1_2 etc):
| weiliu89 | nvidia maxwell 128x64 | 1.31 | 1.33 add tensor | 3.07 | 4 activation | 1.94 | 1.94 maxwell winograd 128x128 | 9.88 | - maxwell 128x64 | - | 11.77 add tensor | 3.03 | 3.1 activation | 2.3 | 1.9 max pool | 1.5 | 1.3
and for the other convolutions (conv 2_1, conv2_2, etc)
weiliu89 | nvidia 5.28 | 5.2 6.61 | 10.68 3.44 | 5.03 6.06 | 10.92 6.04 | 10.61 3.7 | 5.06 10.05 | 10 10.15 | 10.9 3.1 | 3.06 3.1 | 3.06 3.1 | 3.06 6.4 | 8.07
May you please help in making the detection faster? Am I missing some configuration option/ flag? Thank you so much for your help Caterina