sriharikarnam commented 6 years ago

Background: Porting Mxnet Deep Learning framework to ROCm Platform

Issue : While generating 'libmxnet.so' file for HIP/ROCm platform we are facing below mentioned compilation issue.

Error: src/operator/./nn/./pool.cuh:339:7: error: no matching function for call to 'atomicAdd' atomicAdd(&in_grad[in_offset+max_idx], out_grad[index]); error_log.txt

Environment info:

Operating System: Ubunut 16.04
Compiled the code with hipcc
ROCm Version: 1.7.60

Steps to reproduce the issue:

$git clone --recursive https://github.com/ROCmSoftwarePlatform/mxnet.git
$git clone --recursive https://github.com/ROCmSoftwarePlatform/Thrust.git
$cd mxnet

remove the NVCC flag at src/operator/nn/pool.cuh:339

if defined(HIP_PLATFORM_NVCC)

atomicAdd(&in_grad[in_offset+max_idx], out_grad[index]); //TODO. Fix compilation issue for HCC
#endif

$export HIP_PLATFORM=hcc
$hipcc -c -o build/src/operator/pooling_gpu.o -std=c++11 -Xcompiler -D_FORCE_INLINES -g -O3 --amdgpu-target=gfx803 --amdgpu-target=gfx900 -Xcompiler -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -I../Thrust -I. -I. -I/opt/rocm/hipblas/include -I/opt/rocm/rocblas/include -I/opt/rocm/hiprand/include -I/opt/rocm/rocrand/include -I/opt/rocm/hcfft/include -I../mxnet/mshadow/ -I../mxnet/dmlc-core/include -fPIC -I../mxnet/nnvm/include -Iinclude -funroll-loops -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_OPENCV=1 -I/usr/include/opencv -fopenmp -DMSHADOW_USE_CUDNN=1 -I../mxnet/cub-hip -DMXNET_USE_NVRTC=0 src/operator/pooling.cu

whchung commented 6 years ago

Please check if the data type in use fall within supported ones in HIP: https://github.com/ROCm-Developer-Tools/HIP/blob/master/include/hip/hcc_detail/hip_runtime.h#L177

sriharikarnam commented 6 years ago

@whchung We checked the atomicAdd overload in the hip_runtime.h in the link mentioned above.But the overload functions for double and half data types are present in src/common/cuda_utils.h guarded for device compilation path.The code compiles successfully for HIP/CUDA(NVCC) path but reports error for HIP/ROCm(HCC) path.

whchung commented 6 years ago

@sriharikarnam It appears the implementation of HIP is incomplete. Please raise a ticket in HIP repository.

As a temporary workaround, please try cast double / half types to support ones.

sriharikarnam commented 6 years ago

@whchung 1)The overload functions for double and half data types are implemented in the file src/common/cuda_utils.h of mxnet source code.These are user defined functions in mxnet source code not HIP related. 2)The hcc compiler is not be able to find the overloaded functions for double and half data types and gives error no matching function for call to 'atomicAdd' whereas nvcc compiler succesfully compiles

Overload function's prototype

// Overload atomicAdd to work for floats on all architectures

if (HIP_DEVICE_COMPILE) && (__HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__)

static inline device void atomicAdd(double *address, double val) {

}

endif

// Overload atomicAdd for half precision

if (HIP_DEVICE_COMPILE)

static inline device void atomicAdd(mshadow::half::half_t *address, mshadow::half::half_t val) {

}

endif

whchung commented 6 years ago

@sriharikarnam thanks for the explanation. Now it does seem to be a limitation in the API exposed by HCC.

Before new APIs can be added, please try workaround the issue by casting to supported data types.

whchung commented 6 years ago

@aaronenyeshi / @AlexVlx / @scchan for awareness

AlexVlx commented 6 years ago

@sriharikarnam can you please try to switch the guards so that the body of the functions is what is guarded and not the outright signature (i.e. move the #if / #endif immediately after and, respectively, immediately before the curly brace). Thanks.

sriharikarnam commented 6 years ago

@AlexVlx As suggested by you we have guarded the code between the curl braces, with this change on hcc we were able to compile successfully, whereas on nvcc path below issue is seen, error: cannot overload functions distinguished by return type alone".

Steps to reproduce

$export HIP_PLATFORM=nvcc
$hipcc -c -o build/src/operator/pooling_gpu.o -std=c++11 -Xcompiler -D_FORCE_INLINES -g -O3 -ccbin g++ -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_50,code=compute_50 -Xcompiler \"-DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -I../../Thrust -I. -I. -I/opt/rocm/hipblas/include -I/opt/rocm/rocblas/include -I/opt/rocm/hiprand/include -I/opt/rocm/rocrand/include -I/opt/rocm/hcfft/include -I../mxnet/mshadow/ -I../mxnet/dmlc-core/include -fPIC -I../mxnet/nnvm/include -Iinclude -funroll-loops -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_OPENCV=1 -I/usr/include/opencv -fopenmp -DMSHADOW_USE_CUDNN=1 -I../mxnet/cub-hip -DMXNET_USE_NVRTC=0\" src/operator/pooling.cu

AlexVlx commented 6 years ago

@sriharikarnam that is because from CUDA8 onward atomicAdd for doubles is provided by the CUDART itself, see: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd. So you need an outer guard for the CUDA version and CUDA arch (see the example above in the same document, which shows precisely atomicAdd for doubles). Fundamentally I think you want to always enable this in HCC (definitely not only on the device compilation path, as the original guard did), and only for a particular CUDA version and CUDA arch (again, please see the example).

ROCm / hcc

Error : no matching function for call to 'atomicAdd' #606

if defined(HIP_PLATFORM_NVCC)

if (HIP_DEVICE_COMPILE) && (__HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__)

endif

if (HIP_DEVICE_COMPILE)

endif