facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.95k forks source link

Trouble with inference on GPU #2036

Open saliltambe opened 6 years ago

saliltambe commented 6 years ago

System information

**** Summary **** Hi,

I am trying to do a simple inference using GPU(squeeeznet) for testing. I managed to run it on CPU but the network initialization step continues to fail. Here is the part of the code(based on an online tutorial):

define WITH_CUDA

. . .

_#ifdef WITH_CUDA DeviceOption option; option.set_device_type(CUDA); option.clear_cuda_gpu_id(); option.set_cuda_gpu_id(0); new CUDAContext(option);

endif

    // Load Squeezenet model
    NetDef init_net, predict_net;

    // >>> with open(path_to_INIT_NET) as f:
    CAFFE_ENFORCE(ReadProtoFromFile(FLAGS_init_net, &init_net));

    // >>> with open(path_to_PREDICT_NET) as f:
    CAFFE_ENFORCE(ReadProtoFromFile(FLAGS_predict_net, &predict_net));
    // >>> p = workspace.Predictor(init_net, predict_net)

ifdef WITH_CUDA

    init_net.mutable_device_option()->set_device_type(CUDA);
    predict_net.mutable_device_option()->set_device_type(CUDA);

else

    init_net.mutable_device_option()->set_device_type(CPU);
    predict_net.mutable_device_option()->set_device_type(CPU);

endif

    Workspace workspace("current");
    CAFFE_ENFORCE(workspace.RunNetOnce(init_net));_ **FAILS HERE**

I am not being able to get the last line to run. I keep getting the following error:
Unhandled exception at 0x00007FFC31853C58 in DeepMatting.exe: Microsoft C++ exception: caffe2::EnforceNotMet at memory location 0x0000007B5A2FE780. occurred

Any ideas?

Just a little bit more information about the build. I did have some errors during the build process as well,
 but I can see all the libraries being built (like caffe.lib, caffe_gpu.lib, caffe2_pybind11_state.lib,
caffe2_pybind11_state_gpu.lib etc). Even the binaries for the tests are built with each of the built gpu test
 passing without any error. ex: reshape_op_gpu_test, operator_gpu_test, net_gpu_test, event_gpu_test,
 blob_gpu_test.

It is noteworthy though that some of the tests did not get built, like context_gpu_test,
speed_benchmark.
Parts of the build log are included here:

**BUILD LOG**
caffe2_gpu.lib(caffe2_gpu_generated_math_gpu.cu.obj) : error LNK2019: unresolved external symbol cublasSgemm_v2 refer
enced in function "void __cdecl caffe2::math::Gemm<float,class caffe2::CUDAContext,class caffe2::DefaultEngine>(enum CB
LAS_TRANSPOSE,enum CBLAS_TRANSPOSE,int,int,int,float,float const *,float const *,float,float *,class caffe2::CUDAContex
t *,enum caffe2::TensorProto_DataType)" (??$Gemm@MVCUDAContext@caffe2@@VDefaultEngine@2@@math@caffe2@@YAXW4CBLAS_TRANSP
OSE@@0HHHMPEBM1MPEAMPEAVCUDAContext@1@W4TensorProto_DataType@1@@Z) [C:\Users\stambe\DL_Frameworks\Caffe2-Cuda-Unpool-te
st\build\caffe2\binaries\split_db.vcxproj]
  caffe2_gpu.lib(caffe2_gpu_generated_math_gpu.cu.obj) : error LNK2019: unresolved external symbol cublasHgemm referenc
ed in function "void __cdecl caffe2::math::Gemm<struct caffe2::__f16,class caffe2::CUDAContext,class caffe2::DefaultEng
ine>(enum CBLAS_TRANSPOSE,enum CBLAS_TRANSPOSE,int,int,int,float,struct caffe2::__f16 const *,struct caffe2::__f16 cons
t *,float,struct caffe2::__f16 *,class caffe2::CUDAContext *,enum caffe2::TensorProto_DataType)" (??$Gemm@U__f16@caffe2
@@VCUDAContext@2@VDefaultEngine@2@@math@caffe2@@YAXW4CBLAS_TRANSPOSE@@0HHHMPEBU__f16@1@1MPEAU31@PEAVCUDAContext@1@W4Ten
sorProto_DataType@1@@Z) [C:\Users\stambe\DL_Frameworks\Caffe2-Cuda-Unpool-test\build\caffe2\binaries\split_db.vcxproj]
  caffe2_gpu.lib(caffe2_gpu_generated_math_gpu.cu.obj) : error LNK2019: unresolved external symbol cublasSgemmEx refere
nced in function "void __cdecl caffe2::math::Gemm<struct caffe2::__f16,class caffe2::CUDAContext,class caffe2::DefaultE
ngine>(enum CBLAS_TRANSPOSE,enum CBLAS_TRANSPOSE,int,int,int,float,struct caffe2::__f16 const *,struct caffe2::__f16 co
nst *,float,struct caffe2::__f16 *,class caffe2::CUDAContext *,enum caffe2::TensorProto_DataType)" (??$Gemm@U__f16@caff
e2@@VCUDAContext@2@VDefaultEngine@2@@math@caffe2@@YAXW4CBLAS_TRANSPOSE@@0HHHMPEBU__f16@1@1MPEAU31@PEAVCUDAContext@1@W4T
ensorProto_DataType@1@@Z) [C:\Users\stambe\DL_Frameworks\Caffe2-Cuda-Unpool-test\build\caffe2\binaries\split_db.vcxproj
]
  caffe2_gpu.lib(elemenntwise_rtc_gpu.obj) : error LNK2001: unresolved external symbol nvrtcGetProgramLog [C:\Users\sta
mbe\DL_Frameworks\Caffe2-Cuda-Unpool-test\build\caffe2\binaries\tutorial_blob.vcxproj]
  caffe2_gpu.lib(common_gpu.obj) : error LNK2019: unresolved external symbol cudaGetDeviceCount referenced in function
"bool __cdecl caffe2::GetCudaPeerAccessPattern(class std::vector<class std::vector<bool,class std::allocator<bool> >,cl
ass std::allocator<class std::vector<bool,class std::allocator<bool> > > > *)" (?GetCudaPeerAccessPattern@caffe2@@YA_NP
EAV?$vector@V?$vector@_NV?$allocator@_N@std@@@std@@V?$allocator@V?$vector@_NV?$allocator@_N@std@@@std@@@2@@std@@@Z) [C:
\Users\stambe\DL_Frameworks\Caffe2-Cuda-Unpool-test\build\caffe2\binaries\tutorial_blob.vcxproj]
  caffe2_gpu.lib(common_gpu.obj) : error LNK2019: unresolved external symbol cudaPointerGetAttributes referenced in fun
ction "int __cdecl caffe2::GetGPUIDForPointer(void const *)" (?GetGPUIDForPointer@caffe2@@YAHPEBX@Z) [C:\Users\stambe\D
L_Frameworks\Caffe2-Cuda-Unpool-test\build\caffe2\binaries\tutorial_blob.vcxproj]
  C:\Users\stambe\DL_Frameworks\Caffe2-Cuda-Unpool-test\build\bin\Release\tutorial_blob.exe : fatal error LNK1120: 151
unresolved externals [C:\Users\stambe\DL_Frameworks\Caffe2-Cuda-Unpool-test\build\caffe2\binaries\tutorial_blob.vcxproj
]

    8536 Warning(s)
    23066 Error(s)

Any ideas about what am I missing? 
 Please note that I have worked with caffe before and have some code running on the GPU which I intend
 to run using caffe2. I managed to add some of the missing operators(like unpooling) and could even
 successfully translate the model from caffe to caffe2 but I am stuck with this inference issue on the GPU.
 Sorry for being so verbose but I wanted to give all the information. Any help will be highly appreciated! :)
EvrenGursoy commented 6 years ago

Hi, you probably would have resolved this by now, but just in case you haven't I think the problem may be due to the ordering of your code. The set_device_type(CUDA) statements should execute before the ReadProtoFromFile calls. It works for me without any issue if I structure the code as below.

// Load Squeezenet model
    NetDef init_net, predict_net;

#ifdef WITH_CUDA
init_net.mutable_device_option()->set_device_type(CUDA);
predict_net.mutable_device_option()->set_device_type(CUDA);
#else
init_net.mutable_device_option()->set_device_type(CPU);
predict_net.mutable_device_option()->set_device_type(CPU);
#endif

// >>> with open(path_to_INIT_NET) as f:
    CAFFE_ENFORCE(ReadProtoFromFile(FLAGS_init_net, &init_net));

    // >>> with open(path_to_PREDICT_NET) as f:
    CAFFE_ENFORCE(ReadProtoFromFile(FLAGS_predict_net, &predict_net));
    // >>> p = workspace.Predictor(init_net, predict_net)
soldatjiang commented 6 years ago

Have you solved this problem? If so, could you tell me the way to run GPU inference?