Open sriharikarnam opened 5 years ago
Please review your test case first as there is no clear indication it's a bug in the compiler or runtime.
Reviewing copy_log.txt we can see this kernel works fine:
_ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_6detail15normal_iteratorINS_10device_ptrIiEEEENSL_INSM_IfEEEENS_9null_typeESR_SR_SR_SR_SR_SR_SR_EEEENSK_16wrapped_functionINSK_23unary_transform_functorINS_8identityIiEEEEvEEjSR_SR_SR_SR_SR_SR_EEEEEEEEvT0_
but not this one:
_ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_6detail15normal_iteratorINS_7pointerIsNS2_3tagENS_11use_defaultESO_EEEENSL_INS_10device_ptrIfEEEENS_9null_typeESU_SU_SU_SU_SU_SU_SU_EEEENSK_16wrapped_functionINSK_23unary_transform_functorINS_8identityIsEEEEvEEjSU_SU_SU_SU_SU_SU_EEEEEEEEvT0_
If you use c++filt -n
to demangle the good one you can find traces of thrust::detail::unary_transform_functor<thrust::identity<int>
, while in the bad one you can see thrust::detail::unary_transform_functor<thrust::identity<short>
. Did you really intend to use short
type in you kernel call site?
My theory is that the kernel may be expecting something like int
but somehow at kernel call site short
is used and thus the kernel launch logic got an incorrectly mangled signature and thus couldn't find the right kernel at runtime. Without looking into the test case that's the best guess I have now.
The sample code having the same execution path for both nvcc and hcc but nvcc executes successful and hcc fails.
code_sample.cpp --> hipcc --> hcc --> code generation issue
code_sample.cpp --> hipcc --> nvcc --> code generation works
We are suspecting that code generation at compilation causing the run time error in hcc.
Attached the error generating code snippet below. copy_sample.cpp.txt
Please find attached log, captured after exporting HIP_TRACE_API=2 copy_sample_Trace.txt
Also attached LLVM and ISA dump for the issue. copy_sample_dump.zip
Steps to reproduce the issue: $git clone --recursive https://github.com/ROCmSoftwarePlatform/Thrust.git $cd Thrust $cd examples copy the given sample (copy_sample.cpp) to examples $hipcc copy_sample.cpp -I../ -o copy_sample.out $./copy_sample.out
@david-salinas for apparent reasons I can't really be assigned to this ticket and drive it to conclusion J
For the first copy test that failed, I could see that the host is trying to launch a kernel with name:
_ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_6detail15normal_iteratorINS_7pointerIiNS2_3tagENS_11use_defaultESO_EEEENSL_INS_10device_ptrIfEEEENS_9null_typeESU_SU_SU_SU_SU_SU_SU_EEEENSK_16wrapped_functionINSK_23unary_transform_functorINS_8identityIiEEEEvEEjSU_SU_SU_SU_SU_SU_EEEEEEEEvT0_
However, only kernel function that is present in the GPU code has a different name:
_ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_10device_ptrIfEEPfNS_9null_typeESN_SN_SN_SN_SN_SN_SN_EEEENS_6detail16wrapped_functionINSQ_23unary_transform_functorINS_8identityIfEEEEvEElSN_SN_SN_SN_SN_SN_EEEEEEEEvT0_
If you demangle them with c++filt, you'll see that the signatures start to diverge at the first thrust::zip_iterator template:
Host:
thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::pointer<int,...
GPU:
thrust::zip_iterator<thrust::tuple<thrust::device_ptr<float>, float*, thrust::null_type, ...
This seems to indicate that the host and gpu code path aren't observing the same data types and I suspect that is a code porting issue. (e.g. special code paths for the device guarded by a macro).
@sriharikarnam could you take a look into that?
As per your inputs we have identified the signatures of the required kernel. The signatures of kernel are same for HIP/ROCm and HIP/CUDA. Please find the attached documents of the function signatures. Are there any chances of changing the signatures before compilation or at the compilation time. function_signature_cuda.txt function_signature_rocm.txt
Please find the document that contain full over view of the copy api issue.
Background: Porting Thrust library to ROCm Platform
Error Description: The unit test case fails when copying stl vector to device vector for mixed data types (int to float) and giving runtime issue "Missing metadata for global function" for HIP/ROCm. The same test case is running successfully when it is exercises in HIP/CUDA.
The code snippet for causing the issue is { ... std::vector v(5);
v[0] = 0; v[1] = 1; v[2] = 2; v[3] = 3; v[4] = 4;
thrust::device_vector d(5, (float) 10);
typename thrust::device_vector::iterator d_result = thrust::copy(v.begin(), v.end(), d.begin());
...
}
Error: The error Generated after executing the test case: terminate called after throwing an instance of 'std::runtime_error' what(): Missing metadata for global function: _ZN6thrust6system4cuda6detail5bulk_6detail15launch_by_valueILj0ENS4_9cuda_taskINS3_14parallel_groupINS3_16concurrent_groupINS3_5agentILm1EEELm0EEELm0EEENS4_7closureINS2_17for_each_n_detail15for_each_kernelENS_5tupleINS4_6cursorILj0EEENS_12zip_iteratorINSG_INS_6detail15normal_iteratorINS_7pointerIsNS2_3tagENS_11use_defaultESO_EEEENSL_INS_10device_ptrIfEEEENS_9null_typeESU_SU_SU_SU_SU_SU_SU_EEEENSK_16wrapped_functionINSK_23unary_transform_functorINS_8identityIsEEEEvEEjSU_SU_SU_SU_SU_SUEEEEEEEEvT0 Aborted
Environment info:
ROCm version dpkg -s rocm-dkms Package: rocm-dkms Status: install ok installed Priority: optional Section: devel Installed-Size: 13 Maintainer: Advanced Micro Devices Inc. Architecture: amd64 Version: 2.1.96 Depends: rocm-dev, rock-dkms Description: Radeon Open Compute (ROCm) Runtime software stack Homepage: https://github.com/RadeonOpenCompute/ROCm
Please find attached log copy_log.txt, captured after exporting HIP_TRACE_API=2
copy_log.txt
Also attached LLVM and ISA dump for the issue. LLVM and ISA dumps.zip
Steps to Reproduce the issue $git clone --recursive https://github.com/ROCmSoftwarePlatform/Thrust.git $cd Thrust $cd testing $cp copy.cu copy.cpp $export HIP_PLATFORM=hcc $hipcc copy.cpp testframework.cpp -I. -I../ -o copy.out $./copy.out