ROCm / hcc

HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
https://github.com/RadeonOpenCompute/hcc/wiki
Other
423 stars 106 forks source link

Error : no matching function for call to 'atomicAdd' #606

Open sriharikarnam opened 6 years ago

sriharikarnam commented 6 years ago

Background: Porting Mxnet Deep Learning framework to ROCm Platform

Issue : While generating 'libmxnet.so' file for HIP/ROCm platform we are facing below mentioned compilation issue.

Error: src/operator/./nn/./pool.cuh:339:7: error: no matching function for call to 'atomicAdd' atomicAdd(&in_grad[in_offset+max_idx], out_grad[index]); error_log.txt

Environment info:

Steps to reproduce the issue:

whchung commented 6 years ago

Please check if the data type in use fall within supported ones in HIP: https://github.com/ROCm-Developer-Tools/HIP/blob/master/include/hip/hcc_detail/hip_runtime.h#L177

sriharikarnam commented 6 years ago

@whchung We checked the atomicAdd overload in the hip_runtime.h in the link mentioned above.But the overload functions for double and half data types are present in src/common/cuda_utils.h guarded for device compilation path.The code compiles successfully for HIP/CUDA(NVCC) path but reports error for HIP/ROCm(HCC) path.

whchung commented 6 years ago

@sriharikarnam It appears the implementation of HIP is incomplete. Please raise a ticket in HIP repository.

As a temporary workaround, please try cast double / half types to support ones.

sriharikarnam commented 6 years ago

@whchung 1)The overload functions for double and half data types are implemented in the file src/common/cuda_utils.h of mxnet source code.These are user defined functions in mxnet source code not HIP related. 2)The hcc compiler is not be able to find the overloaded functions for double and half data types and gives error no matching function for call to 'atomicAdd' whereas nvcc compiler succesfully compiles

Overload function's prototype

// Overload atomicAdd to work for floats on all architectures

if (HIP_DEVICE_COMPILE) && (__HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__)

static inline device void atomicAdd(double *address, double val) {

}

endif

// Overload atomicAdd for half precision

if (HIP_DEVICE_COMPILE)

static inline device void atomicAdd(mshadow::half::half_t *address, mshadow::half::half_t val) {

}

endif

whchung commented 6 years ago

@sriharikarnam thanks for the explanation. Now it does seem to be a limitation in the API exposed by HCC.

Before new APIs can be added, please try workaround the issue by casting to supported data types.

whchung commented 6 years ago
AlexVlx commented 6 years ago

@sriharikarnam can you please try to switch the guards so that the body of the functions is what is guarded and not the outright signature (i.e. move the #if / #endif immediately after and, respectively, immediately before the curly brace). Thanks.

sriharikarnam commented 6 years ago

@AlexVlx As suggested by you we have guarded the code between the curl braces, with this change on hcc we were able to compile successfully, whereas on nvcc path below issue is seen, error: cannot overload functions distinguished by return type alone".

Steps to reproduce

AlexVlx commented 6 years ago

@sriharikarnam that is because from CUDA8 onward atomicAdd for doubles is provided by the CUDART itself, see: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd. So you need an outer guard for the CUDA version and CUDA arch (see the example above in the same document, which shows precisely atomicAdd for doubles). Fundamentally I think you want to always enable this in HCC (definitely not only on the device compilation path, as the original guard did), and only for a particular CUDA version and CUDA arch (again, please see the example).