ROCm / hcc

HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
https://github.com/RadeonOpenCompute/hcc/wiki
Other
433 stars 108 forks source link

Runtime issue for thrust::copy API #1081

Open sriharikarnam opened 5 years ago

sriharikarnam commented 5 years ago

Background: Porting Thrust library to ROCm Platform

Error Description: The unit test case fails when copying stl vector to device vector for mixed data types (int to float) and giving runtime issue "Missing metadata for global function" for HIP/ROCm. The same test case is running successfully when it is exercises in HIP/CUDA.

The code snippet for causing the issue is { ... std::vector v(5); v[0] = 0; v[1] = 1; v[2] = 2; v[3] = 3; v[4] = 4; thrust::device_vector d(5, (float) 10); typename thrust::device_vector::iterator d_result = thrust::copy(v.begin(), v.end(), d.begin()); ... }

Error: The error Generated after executing the test case: terminate called after throwing an instance of 'std::runtime_error' what(): Missing metadata for global function: _ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_6detail15normal_iteratorINS_7pointerIsNS2_3tagENS_11use_defaultESO_EEEENSL_INS_10device_ptrIfEEEENS_9null_typeESU_SU_SU_SU_SU_SU_SU_EEEENSK_16wrapped_functionINSK_23unary_transform_functorINS_8identityIsEEEEvEEjSU_SU_SU_SU_SU_SUEEEEEEEEvT0 Aborted

Environment info:

Also attached LLVM and ISA dump for the issue. LLVM and ISA dumps.zip

Steps to Reproduce the issue $git clone --recursive https://github.com/ROCmSoftwarePlatform/Thrust.git $cd Thrust $cd testing $cp copy.cu copy.cpp $export HIP_PLATFORM=hcc $hipcc copy.cpp testframework.cpp -I. -I../ -o copy.out $./copy.out

whchung commented 5 years ago

Please review your test case first as there is no clear indication it's a bug in the compiler or runtime.

Reviewing copy_log.txt we can see this kernel works fine:

_ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_6detail15normal_iteratorINS_10device_ptrIiEEEENSL_INSM_IfEEEENS_9null_typeESR_SR_SR_SR_SR_SR_SR_EEEENSK_16wrapped_functionINSK_23unary_transform_functorINS_8identityIiEEEEvEEjSR_SR_SR_SR_SR_SR_EEEEEEEEvT0_

but not this one:

_ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_6detail15normal_iteratorINS_7pointerIsNS2_3tagENS_11use_defaultESO_EEEENSL_INS_10device_ptrIfEEEENS_9null_typeESU_SU_SU_SU_SU_SU_SU_EEEENSK_16wrapped_functionINSK_23unary_transform_functorINS_8identityIsEEEEvEEjSU_SU_SU_SU_SU_SU_EEEEEEEEvT0_

If you use c++filt -n to demangle the good one you can find traces of thrust::detail::unary_transform_functor<thrust::identity<int>, while in the bad one you can see thrust::detail::unary_transform_functor<thrust::identity<short> . Did you really intend to use short type in you kernel call site?

My theory is that the kernel may be expecting something like int but somehow at kernel call site short is used and thus the kernel launch logic got an incorrectly mangled signature and thus couldn't find the right kernel at runtime. Without looking into the test case that's the best guess I have now.

sriharikarnam commented 5 years ago

The sample code having the same execution path for both nvcc and hcc but nvcc executes successful and hcc fails.

code_sample.cpp --> hipcc --> hcc --> code generation issue

code_sample.cpp --> hipcc --> nvcc --> code generation works

We are suspecting that code generation at compilation causing the run time error in hcc.

Attached the error generating code snippet below. copy_sample.cpp.txt

Please find attached log, captured after exporting HIP_TRACE_API=2 copy_sample_Trace.txt

Also attached LLVM and ISA dump for the issue. copy_sample_dump.zip

Steps to reproduce the issue: $git clone --recursive https://github.com/ROCmSoftwarePlatform/Thrust.git $cd Thrust $cd examples copy the given sample (copy_sample.cpp) to examples $hipcc copy_sample.cpp -I../ -o copy_sample.out $./copy_sample.out

whchung commented 5 years ago

@david-salinas for apparent reasons I can't really be assigned to this ticket and drive it to conclusion J

scchan commented 5 years ago

For the first copy test that failed, I could see that the host is trying to launch a kernel with name: _ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_6detail15normal_iteratorINS_7pointerIiNS2_3tagENS_11use_defaultESO_EEEENSL_INS_10device_ptrIfEEEENS_9null_typeESU_SU_SU_SU_SU_SU_SU_EEEENSK_16wrapped_functionINSK_23unary_transform_functorINS_8identityIiEEEEvEEjSU_SU_SU_SU_SU_SU_EEEEEEEEvT0_

However, only kernel function that is present in the GPU code has a different name: _ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_10device_ptrIfEEPfNS_9null_typeESN_SN_SN_SN_SN_SN_SN_EEEENS_6detail16wrapped_functionINSQ_23unary_transform_functorINS_8identityIfEEEEvEElSN_SN_SN_SN_SN_SN_EEEEEEEEvT0_

If you demangle them with c++filt, you'll see that the signatures start to diverge at the first thrust::zip_iterator template:

Host: thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::pointer<int,...

GPU: thrust::zip_iterator<thrust::tuple<thrust::device_ptr<float>, float*, thrust::null_type, ...

This seems to indicate that the host and gpu code path aren't observing the same data types and I suspect that is a code porting issue. (e.g. special code paths for the device guarded by a macro).

@sriharikarnam could you take a look into that?

sriharikarnam commented 5 years ago

As per your inputs we have identified the signatures of the required kernel. The signatures of kernel are same for HIP/ROCm and HIP/CUDA. Please find the attached documents of the function signatures. Are there any chances of changing the signatures before compilation or at the compilation time. function_signature_cuda.txt function_signature_rocm.txt

Please find the document that contain full over view of the copy api issue.