ROCm / MIVisionX

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
https://rocm.docs.amd.com/projects/MIVisionX/en/latest/
MIT License
185 stars 72 forks source link

OpenVX Framework - selects Nvidia GPU but not AMD GPU #333

Closed luguang closed 3 years ago

luguang commented 4 years ago

CPU: AMD Ryzen 5 2600X GPU1: AMD RX 580 GPU2: Nvidia Geforce GTX 1650 OS: Ubuntu 18.04 CUDA: 10.2 RoCm: 3.5 (ok on 3.3)

I installed CUDA 10.2 and RoCm 3.3 before on Ubuntu 18.04 and everything works fine with MIVisionX sample cases.

Afterwards I upgraded RoCm to 3.5 and found the case could not work. I Reinstalled Ubuntu from scratch again and found it didn't help.

Earlier on RoCm 3.3 OpenVX wo't detect Nvidia GPU out and the OpenCL device.

$ ./classifier --mode 2 --image data/images/img_02.JPG --model_weights ../yolo/weights.bin --label data/sample_detection_labels.txt --model_input_dims 3,416,416 --model_output_dims 125,12,12 --model_name YoloV2_Caffe --multiply 0.003922,0.003922,0.003922 OK: loaded 32 kernels from libvx_nn.so OK: OpenVX using GPU device#0 (GeForce GTX 1650) [OpenCL 1.2 CUDA] [SvmCaps 0 1] clang-11: error: no such file or directory: 'GTX' clang-11: error: no such file or directory: '1650' clang-11: error: Unsupported CUDA gpu architecture: GeForce MIOpen Error: /root/driver/MLOpen/src/tmp_dir.cpp:47: Can't execute cd /tmp/miopen-gridwise_convolution_implicit_gemm_v4r4_nchw_kcyx_nkhw.cpp-765a-5a90-2fe1-2274; /opt/rocm-3.5.0/llvm/bin/clang++ -std=c++14 -DCK_PARAM_PROBLEM_N=1 -DCK_PARAM_PROBLEM_K=32 -DCK_PARAM_PROBLEM_C=16 -DCK_PARAM_PROBLEM_HI=208 -DCK_PARAM_PROBLEM_WI=208 -DCK_PARAM_PROBLEM_HO=208 -DCK_PARAM_PROBLEM_WO=208 -DCK_PARAM_PROBLEM_Y=3 -DCK_PARAM_PROBLEM_X=3 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_IN_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_IN_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_IN_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_IN_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_FORWARD=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_DATA=0 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_WEIGHT=0 -DCK_PARAM_TUNABLE_BLOCK_SIZE=64 -DCK_PARAM_TUNABLE_GEMM_M_PER_BLOCK=32 -DCK_PARAM_TUNABLE_GEMM_N_PER_BLOCK=64 -DCK_PARAM_TUNABLE_GEMM_K_PER_BLOCK=16 -DCK_PARAM_TUNABLE_GEMM_M_PER_THREAD=2 -DCK_PARAM_TUNABLE_GEMM_N_PER_THREAD=4 -DCK_PARAM_TUNABLE_GEMM_M_LEVEL0_CLUSTER=4 -DCK_PARAM_TUNABLE_GEMM_N_LEVEL0_CLUSTER=4 -DCK_PARAM_TUNABLE_GEMM_M_LEVEL1_CLUSTER=2 -DCK_PARAM_TUNABLE_GEMM_N_LEVEL1_CLUSTER=2 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_K=4 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_M=16 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_SRC_DATA_PER_READ_GEMM_K=4 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_DST_DATA_PER_WRITE_GEMM_M=2 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_K=1 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_N=64 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_SRC_DATA_PER_READ_GEMM_N=1 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_DST_DATA_PER_WRITE_GEMM_N=1 -DCK_PARAM_TUNABLE_GEMM_C_THREAD_COPY_DST_DATA_PER_WRITE_GEMM_N1=4 -DCK_PARAM_DEPENDENT_GRID_SIZE=676 -DCK_THREADWISE_GEMM_USE_AMD_INLINE_ASM=1 -DCK_USE_AMD_INLINE_ASM=1 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 --cuda-gpu-arch=GeForce GTX 1650 --cuda-device-only -c -O3 -Wno-unused-command-line-argument -I. -x hip --hip-device-lib-path=/opt/rocm/lib -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -DHIP_ROCclr=1 -isystem /opt/rocm-3.5.0/hip/../include -isystem /opt/rocm-3.5.0/llvm/lib/clang/11.0.0/include/.. -DHIP_PLATFORM_HCC=1 -DHIP_ROCclr=1 -isystem /opt/rocm-3.5.0/hip/include -isystem /opt/rocm/include --hip-device-lib-path=/opt/rocm/lib --hip-link -mllvm -amdgpu-enable-global-sgpr-addr -mllvm --amdgpu-spill-vgpr-to-agpr=0 gridwise_convolution_implicit_gemm_v4r4_nchw_kcyx_nkhw.cpp -o /tmp/miopen-gridwise_convolution_implicit_gemm_v4r4_nchw_kcyx_nkhw.cpp-765a-5a90-2fe1-2274/gridwise_convolution_implicit_gemm_v4r4_nchw_kcyx_nkhw.cpp.o ERROR: fatal error occured at /root/driver/MIVisionX/amd_openvx_extensions/amd_nn/src/convolution_layer.cpp#378

kiritigowda commented 4 years ago

@luguang can you use the latest setup script - V1.7.13 and use TOT or V1.9.1 MIVisionX to see if your issue is fixed. Thanks!

kiritigowda commented 4 years ago

@luguang did you get a chance to look at the TOT MIVisionX. This issue should not occur with latest ROCm & MIVisionX. Please let me know if this still persists.

kiritigowda commented 3 years ago

The developer can also choose the device by setting an environmental variable AGO_OPENCL_PLATFORM

AGO_OPENCL_PLATFORM=[DEVICE_ID]