Hello
I use the last code to build caffe static library to invoke caffe model by C++.
when I used GPU mode .
It worked well In Tesla k80 takes only 20ms.
But it is worked slowly in Tesla P4,it takes above 400ms and "Forward()" takes nearby 390ms.
The environment of the two machines is the same , pytorch and mxnet work well in both them.
Hello I use the last code to build caffe static library to invoke caffe model by C++. when I used GPU mode . It worked well In Tesla k80 takes only 20ms. But it is worked slowly in Tesla P4,it takes above 400ms and "Forward()" takes nearby 390ms. The environment of the two machines is the same , pytorch and mxnet work well in both them.
Steps to reproduce
1.modify the Makefile.config USE_CUDNN := 1 USE_OPENCV := 1 USE_LEVELDB := 1 USE_LMDB := 1 USE_HDF5 := 1 OPENCV_VERSION := 3 CUDA_DIR := /usr/local/cuda-9.2 CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_52,code=sm_52 \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61 \ -gencode=arch=compute_70,code=sm_70 \ -gencode=arch=compute_70,code=compute_70 BLAS := open BLAS_INCLUDE := /usr/local/OpenBLAS/include BLAS_LIB := /usr/local/OpenBLAS/lib USE_NCCL := 1
2.make clean && make
Tried solutions
modifiy the CUDA_ARCH in the Makefile.config CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_52,code=sm_52 \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=code=compute_61 to CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_52,code=sm_52 \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61 \ -gencode=arch=compute_70,code=sm_70 \ -gencode=arch=compute_70,code=compute_70
System configuration