BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
33.98k stars 18.72k forks source link

caffe is worked slowly in the Tesla P4 #6935

Closed huyutao3550346 closed 4 years ago

huyutao3550346 commented 4 years ago

Hello I use the last code to build caffe static library to invoke caffe model by C++. when I used GPU mode . It worked well In Tesla k80 takes only 20ms. But it is worked slowly in Tesla P4,it takes above 400ms and "Forward()" takes nearby 390ms. The environment of the two machines is the same , pytorch and mxnet work well in both them. image image

Steps to reproduce

1.modify the Makefile.config USE_CUDNN := 1 USE_OPENCV := 1 USE_LEVELDB := 1 USE_LMDB := 1 USE_HDF5 := 1 OPENCV_VERSION := 3 CUDA_DIR := /usr/local/cuda-9.2 CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_52,code=sm_52 \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61 \ -gencode=arch=compute_70,code=sm_70 \ -gencode=arch=compute_70,code=compute_70 BLAS := open BLAS_INCLUDE := /usr/local/OpenBLAS/include BLAS_LIB := /usr/local/OpenBLAS/lib USE_NCCL := 1

2.make clean && make

Tried solutions

modifiy the CUDA_ARCH in the Makefile.config CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_52,code=sm_52 \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=code=compute_61 to CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_52,code=sm_52 \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61 \ -gencode=arch=compute_70,code=sm_70 \ -gencode=arch=compute_70,code=compute_70

System configuration