GridTools / gridtools

Libraries and utilities to develop performance portable applications for weather and climate.
https://gridtools.github.io/gridtools
Other
60 stars 21 forks source link

gridtools v2.0.0 compiler errors and some ctest failures for rocm3.8.0 (Clang 11) #1576

Closed yaomingamd closed 4 months ago

yaomingamd commented 4 years ago

We have compiler errors and some ctest failures with v2.0.02 using rocm3.8.0 (clang 11). Compiler errors:

/home/projects/GridTools/gridtools2.0/tests/include/test_environment.hpp:40:5: note: expanded from macro 'GT_REGRESSION_TEST' TYPED_TEST(name, test) ^ /home/projects/GridTools/gridtools2.0/build/_deps/googletest-src/googletest/include/gtest/gtest-typed-test.h:213:27: note: expanded from macro 'TYPED_TEST' CaseName)>::Register("", \ ^ /home/projects/GridTools/gridtools2.0/include/gridtools/stencil/cpu_ifirst/execinfo.hpp:58:33: note: candidate template ignored: substitution failure [with ThreadPool = gridtools::thread_pool::omp, Grid = gridtools::stencil::core::grid<gridtools::stencil::core::interval<gridtools::stencil::core::level<0, 1, 2>, gridtools::stencil::core::level<1, -1, 2>>>] GT_FORCE_INLINE execinfo(ThreadPool, const Grid &grid) ^ /home/projects/GridTools/gridtools2.0/include/gridtools/stencil/cpu_ifirst/execinfo.hpp:46:19: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 2 were provided class execinfo { ^ /home/projects/GridTools/gridtools2.0/include/gridtools/stencil/cpu_ifirst/execinfo.hpp:46:19: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 2 were provided 6 errors generated when compiling for gfx906.

yaomingamd commented 4 years ago

For ctest on MI50 (gfx906) 80% tests passed, 60 tests failed out of 305

Label Time Summary: cartesian = 17.03 secproc (100 tests) cmake = 0.87 secproc (1 test) cpu = 239.48 secproc (6 tests) cpu_ifirst = 0.33 secproc (39 tests) cpu_kfirst = 0.09 secproc (39 tests) cuda = 2.54 secproc (6 tests) gcl = 249.38 secproc (7 tests) getting_started = 3.79 secproc (10 tests) gpu = 32.03 secproc (51 tests) gpu_horizontal = 14.37 secproc (34 tests) icosahedral = 16.28 secproc (70 tests) mpi = 249.38 secproc (7 tests) naive = 2.61 secproc (34 tests) regression = 31.91 secproc (149 tests) storage = 0.20 secproc (11 tests) unit_test = 10.30 secproc (94 tests)

Total Test time (real) = 124.39 sec

The following tests FAILED: 44 - copy_stencil_parallel_cpu (Failed) 49 - horizontal_diffusion_cpu_kfirst (Not Run) 50 - horizontal_diffusion_cpu_ifirst (Not Run) 54 - horizontal_diffusion_fused_cpu_kfirst (Not Run) 55 - horizontal_diffusion_fused_cpu_ifirst (Not Run) 59 - simple_hori_diff_cpu_kfirst (Not Run) 60 - simple_hori_diff_cpu_ifirst (Not Run) 64 - copy_stencil_cpu_kfirst (Not Run) 65 - copy_stencil_cpu_ifirst (Not Run) 69 - vertical_advection_dycore_cpu_kfirst (Not Run) 70 - vertical_advection_dycore_cpu_ifirst (Not Run) 74 - advection_pdbott_prepare_tracers_cpu_kfirst (Not Run) 75 - advection_pdbott_prepare_tracers_cpu_ifirst (Not Run) 79 - laplacian_cpu_kfirst (Not Run) 80 - laplacian_cpu_ifirst (Not Run) 84 - positional_stencil_cpu_kfirst (Not Run) 85 - positional_stencil_cpu_ifirst (Not Run) 89 - tridiagonal_cpu_kfirst (Not Run) 90 - tridiagonal_cpu_ifirst (Not Run) 94 - alignment_cpu_kfirst (Not Run) 95 - alignment_cpu_ifirst (Not Run) 99 - extended_4D_cpu_kfirst (Not Run) 100 - extended_4D_cpu_ifirst (Not Run) 104 - expandable_parameters_cpu_kfirst (Not Run) 105 - expandable_parameters_cpu_ifirst (Not Run) 109 - expandable_parameters_single_kernel_cpu_kfirst (Not Run) 110 - expandable_parameters_single_kernel_cpu_ifirst (Not Run) 112 - horizontal_diffusion_functions_gpu (Child aborted) 114 - horizontal_diffusion_functions_cpu_kfirst (Not Run) 115 - horizontal_diffusion_functions_cpu_ifirst (Not Run) 119 - whole_axis_access_cpu_kfirst (Not Run) 120 - whole_axis_access_cpu_ifirst (Not Run) 129 - stencil_on_edges_multiplefields_cpu_kfirst (Not Run) 134 - stencil_on_cells_cpu_kfirst (Not Run) 139 - stencil_on_neighcell_of_edges_cpu_kfirst (Not Run) 144 - stencil_manual_fold_cpu_kfirst (Not Run) 149 - copy_stencil_icosahedral_cpu_kfirst (Not Run) 154 - expandable_parameters_icosahedral_cpu_kfirst (Not Run) 159 - stencil_on_cells_color_cpu_kfirst (Not Run) 164 - stencil_on_edges_cpu_kfirst (Not Run) 169 - stencil_fused_cpu_kfirst (Not Run) 174 - stencil_on_neighedge_of_cells_cpu_kfirst (Not Run) 179 - stencil_on_vertices_cpu_kfirst (Not Run) 184 - curl_cpu_kfirst (Not Run) 189 - div_cpu_kfirst (Not Run) 192 - lap_gpu (Child aborted) 194 - lap_cpu_kfirst (Not Run) 198 - test_halo_exchange_3D_gpu (Failed) 250 - test_kcache_fill_cpu_kfirst (Not Run) 251 - test_kcache_fill_cpu_ifirst (Not Run) 255 - test_kcache_fill_and_flush_cpu_kfirst (Not Run) 256 - test_kcache_fill_and_flush_cpu_ifirst (Not Run) 260 - test_kcache_flush_cpu_kfirst (Not Run) 261 - test_kcache_flush_cpu_ifirst (Not Run) 265 - test_kcache_local_cpu_kfirst (Not Run) 266 - test_kcache_local_cpu_ifirst (Not Run) 270 - test_kparallel_cpu_kfirst (Not Run) 271 - test_kparallel_cpu_ifirst (Not Run) 280 - test_tmp_storage_sid_cpu_ifirst (Not Run) 295 - cmaketest_storage_gpu (Failed) Errors while running CTest

havogt commented 4 years ago

Thanks for reporting these issues.

After quick analysis, I see the following categories of failing tests:

fthaler commented 4 years ago

The mentioned compilation errors come from the fact that _OPENMP is not defined in when the HIP device code is compiled. We tested internally using an older HIP compiler that was built without OpenMP support, so we did not see this error. We will upgrade the internal testing environment to the official HIP 3.8.0 release in #1578, which also includes the OpenMP fix. @havogt was right with his conclusions, I can confirm that all tests pass on our side apart from horizontal_diffusion_functions_gpu and lap_gpu which fail with an unknown error in some of the AMD libraries:

…
:3:hip_platform.cpp         :198 : 802921524616 us: 60051: [7ffff7fc6880] __hipPushCallConfiguration ( {1,5,61}, {64,10,1}, 8768, stream:<null> )
:3:hip_platform.cpp         :202 : 802921524618 us: 60051: [7ffff7fc6880] __hipPushCallConfiguration: Returned hipSuccess :
:3:hip_platform.cpp         :209 : 802921524623 us: 60051: [7ffff7fc6880] __hipPopCallConfiguration ( {2,2,61}, {4294906752,32767,7046968}, 0x7fffffff1298, 0x7fffffff1290 )
:3:hip_platform.cpp         :218 : 802921524625 us: 60051: [7ffff7fc6880] __hipPopCallConfiguration: Returned hipSuccess :
:3:hip_module.cpp           :395 : 802921524631 us: 60051: [7ffff7fc6880] hipLaunchKernel ( 0x4144e0, {1,5,61}, {64,10,1}, 0x7fffffff1300, 8768, stream:<null> )
:3:hip_module.cpp           :206 : 802921524648 us: 60051: [7ffff7fc6880] ihipModuleLaunchKernel ( 0x0x6ebb30, 64, 50, 61, 64, 10, 1, 8768, stream:<null>, 0x7fffffff1300, char array:<null>, event:0, event:0, 0, 0 )
:4:command.cpp              :259 : 802921524651 us: command is enqueued: 0x905f00
:3:hip_platform.cpp         :645 : 802921524655 us: 60051: [7ffff7fc6880] ihipLaunchKernel: Returned hipSuccess :
:3:hip_module.cpp           :396 : 802921524657 us: 60051: [7ffff7fc6880] hipLaunchKernel: Returned hipSuccess :
:3:hip_error.cpp            :27  : 802921524658 us: 60051: [7ffff7fc6880] hipGetLastError (  )
:4:commandqueue.cpp         :165 : 802921524661 us: command is submitted: 0x905f00
:3:rocvirtual.cpp           :2182: 802921524669 us: [7fffee7b8700]!     ShaderName : _ZN9gridtools7stencil11gpu_backend19launch_kernel_impl_7wrapperILm640ELi64ELi8ENS0_6extentILin1ELi0ELin1ELi0ELi0ELi0EEENS1_21make_kernel_fun_impl_8kernel_fINS_3sid9composite4keysIJNS_4meta4listIJNS0_9cartesian7tmp_argILm0EdEENS0_10cache_type2ijEEEENSC_IJNS0_14frontend_impl_3argILm1EEEEEENSC_IJNSE_ILm1EdEESH_EEENSC_IJNSK_ILm2EEEEEENSC_IJNSK_ILm0EEEEEEEE6valuesIJNS8_22shift_sid_origin_impl_11shifted_sidINS8_15synthetic_impl_9syntheticIJNSX_12unique_mixinILNS8_8propertyE4ENS_5hymap4keysIJNS_17integral_constantIiLi3EEENS13_IiLi0EEENS13_IiLi1EEEEE6valuesIJS15_S15_S15_EEEEENSZ_ILS10_5ENS18_IJS16_NS13_IiLi65EEENS13_IiLi8EEEEEEEENSZ_ILS10_3ENS18_IJS16_S16_S1B_EEEEENSZ_ILS10_2EiEENSZ_ILS10_1ES1F_EENSZ_ILS10_0ENS1_16shared_allocator10lazy_allocIdEEEEEEENS18_IJS15_NS13_IiLin1EEES15_EEENS18_IJS16_NS13_IiLi64EEES1C_EEEEENS8_14as_const_impl_13const_adapterIRNS8_11block_impl_11blocked_sidINSW_IRSt10shared_ptrINS_7storage16data_store_impl_10data_storeINS1Y_3gpuEdNS1Y_10info_impl_4infoINS_5tupleIJiiiEEENS24_IJS16_iiEEESt16integer_sequenceImJLm0ELm1ELm2EEEEEvLb0ELb0EEEENS12_IJS15_S16_NS13_IiLi2EEEEE6valuesIJiiS15_EEENS2F_IJiiiEEEEENS12_IJS15_S16_EE6valuesIJS1Q_S1C_EEEEEEENSW_INSY_IJS1A_NSZ_ILS10_5ENS18_IJS16_S1Q_NS13_IiLi9EEEEEEEENSZ_ILS10_3ENS18_IJS16_S16_S1Q_EEEEES1H_NSZ_ILS10_1ES2S_EES1M_EEENS18_IJS15_S15_S1O_EEES1R_EES2M_S2O_EEENS6_8k_loop_fINS1_7deref_fINS24_IJSM_SS_EEEEENS0_6be_api15fused_view_itemIJNS33_13interval_infoIJNS33_4cellINSC_IJNSD_11stage_impl_5stageIN12_GLOBAL__N_112flx_functionILNS39_9variationE1EEENS24_IJSI_SM_EEEEEEEENS0_4core8intervalINS3G_5levelILj0ELi1ELi2EEENS3I_ILj1ELin1ELi2EEEEENS24_IJNS33_8plh_infoISI_St17integral_constantIbLb1EEdS16_S3N_IbLb0EENS4_ILin1ELi0ELi0ELi0ELi0ELi0EEENSC_IJEEEEENS3M_ISM_S3P_dS1O_S3O_NS4_ILin2ELi2ELin1ELi1ELi0ELi0EEES3R_EEEEES3Q_NS3G_8parallelENS_11disjunctionIJNSB_11st_containsINSC_IJSH_EEESH_EEEEEEENS36_INSC_IJNS38_INS39_12fly_functionILS3B_1EEENS24_IJSO_SM_EEEEEEEES3L_NS24_IJNS3M_ISO_S3O_dS16_S3P_NS4_ILi0ELi0ELin1ELi0ELi0ELi0EEES3R_EENS3M_ISM_S3P_dS1O_S3O_NS4_ILin1ELi1ELin2ELi2ELi0ELi0EEES3R_EEEEES48_S3W_NS3X_IJS3P_EEEEENS36_INSC_IJNS38_INS39_12out_functionENS24_IJSQ_SM_SI_SO_SS_EEEEEEEES3L_NS24_IJNS3M_ISQ_S3P_dS1O_S3P_NS4_ILi0ELi0ELi0ELi0ELi0ELi0EEES3R_EENS3M_ISM_S3P_dS1O_S3O_S4J_S3R_EENS3M_ISI_S3O_dS16_S3O_S3Q_S3R_EENS3M_ISO_S3O_dS16_S3O_S48_S3R_EENS3M_ISS_S3P_dS1O_S3O_S4J_S3R_EEEEES4J_S3W_NS3X_IJS3O_EEEEEEEEEEENS24_IJiEEELi1ES3P_EEEEEEvT3_ii

:4:rocvirtual.cpp           :522 : 802921524677 us: [7fffee7b8700] HWq=0x7ffff7ea2000, Dispatch Header = 0x502 (type=2, barrier=1, acquire=2, release=0), setup=3, grid=[64, 50, 61], workgroup=[64, 10, 1], private_seg_size=0, group_seg_size=8768, kernel_obj=0x7fffee2f87c0, kernarg_address=0x7fffee480000, completion_signal=0x0
:3:hip_memory.cpp           :282 : 802921524662 us: 60051: [7ffff7fc6880] hipFree ( 0x7fffee28e000 )
:4:command.cpp              :221 : 802921524682 us: queue marker to command queue: 0x906290
:4:command.cpp              :259 : 802921524683 us: command is enqueued: 0x90c980
:4:command.cpp              :192 : 802921524687 us: waiting for event 0x905f00 to complete, current status 2
:4:commandqueue.cpp         :165 : 802921524693 us: command is submitted: 0x90c980
:4:rocvirtual.cpp           :605 : 802921524697 us: [7fffee7b8700] HWq=0x7ffff7ea2000, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7fffee75eb00
:0:rocdevice.cpp            :2158: 802921526439 us: Device::callbackQueue aborting with status: 0x29

Currently we do not test the MPI implementation on AMD GPUs because we have no GPU-aware MPI available on the testing machine.

yaomingamd commented 4 years ago

Felix, Thank you for your quick response! Yes, I am using GPU-aware openMPI. I will look into the error message you provided and would fill a ticket to our compiler and driver tems if it turns out bugs on AMD side.

To build GPU-aware openMPI for AMD gpu, please refer to this link and please let me know if you have any problems with this. https://github.com/openucx/ucx/wiki/Build-and-run-ROCM-UCX-OpenMPI Thank you and best regards!

Yaoming

On Fri, Oct 9, 2020 at 9:15 AM Felix Thaler notifications@github.com wrote:

The mentioned compilation errors come from the fact that _OPENMP is not defined in when the HIP device code is compiled. We tested internally using an older HIP compiler that was built without OpenMP support, so we did not see this error. We will upgrade the internal testing environment to the official HIP 3.8.0 release in #1578 https://github.com/GridTools/gridtools/pull/1578, which also includes the OpenMP fix. @havogt https://github.com/havogt was right with his conclusions, I can confirm that all tests pass on our side apart from horizontal_diffusion_functions_gpu and lap_gpu which fail with an unknown error in some of the AMD libraries:

:3:hip_platform.cpp :198 : 802921524616 us: 60051: [7ffff7fc6880] __hipPushCallConfiguration ( {1,5,61}, {64,10,1}, 8768, stream: )

:3:hip_platform.cpp :202 : 802921524618 us: 60051: [7ffff7fc6880] __hipPushCallConfiguration: Returned hipSuccess :

:3:hip_platform.cpp :209 : 802921524623 us: 60051: [7ffff7fc6880] __hipPopCallConfiguration ( {2,2,61}, {4294906752,32767,7046968}, 0x7fffffff1298, 0x7fffffff1290 )

:3:hip_platform.cpp :218 : 802921524625 us: 60051: [7ffff7fc6880] __hipPopCallConfiguration: Returned hipSuccess :

:3:hip_module.cpp :395 : 802921524631 us: 60051: [7ffff7fc6880] hipLaunchKernel ( 0x4144e0, {1,5,61}, {64,10,1}, 0x7fffffff1300, 8768, stream: )

:3:hip_module.cpp :206 : 802921524648 us: 60051: [7ffff7fc6880] ihipModuleLaunchKernel ( 0x0x6ebb30, 64, 50, 61, 64, 10, 1, 8768, stream:, 0x7fffffff1300, char array:, event:0, event:0, 0, 0 )

:4:command.cpp :259 : 802921524651 us: command is enqueued: 0x905f00

:3:hip_platform.cpp :645 : 802921524655 us: 60051: [7ffff7fc6880] ihipLaunchKernel: Returned hipSuccess :

:3:hip_module.cpp :396 : 802921524657 us: 60051: [7ffff7fc6880] hipLaunchKernel: Returned hipSuccess :

:3:hip_error.cpp :27 : 802921524658 us: 60051: [7ffff7fc6880] hipGetLastError ( )

:4:commandqueue.cpp :165 : 802921524661 us: command is submitted: 0x905f00

:3:rocvirtual.cpp :2182: 802921524669 us: [7fffee7b8700]! ShaderName : _ZN9gridtools7stencil11gpu_backend19launch_kernel_impl_7wrapperILm640ELi64ELi8ENS0_6extentILin1ELi0ELin1ELi0ELi0ELi0EEENS1_21make_kernel_fun_impl_8kernel_fINS_3sid9composite4keysIJNS_4meta4listIJNS0_9cartesian7tmp_argILm0EdEENS0_10cache_type2ijEEEENSC_IJNS0_14frontend_impl_3argILm1EEEEEENSC_IJNSE_ILm1EdEESH_EEENSC_IJNSK_ILm2EEEEEENSC_IJNSK_ILm0EEEEEEEE6valuesIJNS8_22shift_sid_origin_impl_11shifted_sidINS8_15synthetic_impl_9syntheticIJNSX_12unique_mixinILNS8_8propertyE4ENS_5hymap4keysIJNS_17integral_constantIiLi3EEENS13_IiLi0EEENS13_IiLi1EEEEE6valuesIJS15_S15_S15_EEEEENSZ_ILS10_5ENS18_IJS16_NS13_IiLi65EEENS13_IiLi8EEEEEEEENSZ_ILS10_3ENS18_IJS16_S16_S1B_EEEEENSZ_ILS10_2EiEENSZ_ILS10_1ES1F_EENSZ_ILS10_0ENS1_16shared_allocator10lazy_allocIdEEEEEEENS18_IJS15_NS13_IiLin1EEES15_EEENS18_IJS16_NS13_IiLi64EEES1C_EEEEENS8_14as_const_impl_13const_adapterIRNS8_11block_impl_11blocked_sidINSW_IRSt10shared_ptrINS_7storage16data_store_impl_10data_storeINS1Y_3gpuEdNS1Y_10info_impl_4infoINS_5tupleIJiiiEEENS24_IJS16_iiEEESt16integer_sequenceImJLm0ELm1ELm2EEEEEvLb0ELb0EEEENS12_IJS15_S16_NS13_IiLi2EEEEE6valuesIJiiS15_EEENS2F_IJiiiEEEEENS12_IJS15_S16_EE6valuesIJS1Q_S1C_EEEEEEENSW_INSY_IJS1A_NSZ_ILS10_5ENS18_IJS16_S1Q_NS13_IiLi9EEEEEEEENSZ_ILS10_3ENS18_IJS16_S16_S1Q_EEEEES1H_NSZ_ILS10_1ES2S_EES1M_EEENS18_IJS15_S15_S1O_EEES1R_EES2M_S2O_EEENS6_8k_loop_fINS1_7deref_fINS24_IJSM_SS_EEEEENS0_6be_api15fused_view_itemIJNS33_13interval_infoIJNS33_4cellINSC_IJNSD_11stage_impl_5stageIN12_GLOBAL__N_112flx_functionILNS39_9variationE1EEENS24_IJSI_SM_EEEEEEEENS0_4core8intervalINS3G_5levelILj0ELi1ELi2EEENS3I_ILj1ELin1ELi2EEEEENS24_IJNS33_8plh_infoISI_St17integral_constantIbLb1EEdS16_S3N_IbLb0EENS4_ILin1ELi0ELi0ELi0ELi0ELi0EEENSC_IJEEEEENS3M_ISM_S3P_dS1O_S3O_NS4_ILin2ELi2ELin1ELi1ELi0ELi0EEES3R_EEEEES3Q_NS3G_8parallelENS_11disjunctionIJNSB_11st_containsINSC_IJSH_EEESH_EEEEEEENS36_INSC_IJNS38_INS39_12fly_functionILS3B_1EEENS24_IJSO_SM_EEEEEEEES3L_NS24_IJNS3M_ISO_S3O_dS16_S3P_NS4_ILi0ELi0ELin1ELi0ELi0ELi0EEES3R_EENS3M_ISM_S3P_dS1O_S3O_NS4_ILin1ELi1ELin2ELi2ELi0ELi0EEES3R_EEEEES48_S3W_NS3X_IJS3P_EEEEENS36_INSC_IJNS38_INS39_12out_functionENS24_IJSQ_SM_SI_SO_SS_EEEEEEEES3L_NS24_IJNS3M_ISQ_S3P_dS1O_S3P_NS4_ILi0ELi0ELi0ELi0ELi0ELi0EEES3R_EENS3M_ISM_S3P_dS1O_S3O_S4J_S3R_EENS3M_ISI_S3O_dS16_S3O_S3Q_S3R_EENS3M_ISO_S3O_dS16_S3O_S48_S3R_EENS3M_ISS_S3P_dS1O_S3O_S4J_S3R_EEEEES4J_S3W_NS3X_IJS3O_EEEEEEEEEEENS24_IJiEEELi1ES3P_EEEEEEvT3_ii

:4:rocvirtual.cpp :522 : 802921524677 us: [7fffee7b8700] HWq=0x7ffff7ea2000, Dispatch Header = 0x502 (type=2, barrier=1, acquire=2, release=0), setup=3, grid=[64, 50, 61], workgroup=[64, 10, 1], private_seg_size=0, group_seg_size=8768, kernel_obj=0x7fffee2f87c0, kernarg_address=0x7fffee480000, completion_signal=0x0

:3:hip_memory.cpp :282 : 802921524662 us: 60051: [7ffff7fc6880] hipFree ( 0x7fffee28e000 )

:4:command.cpp :221 : 802921524682 us: queue marker to command queue: 0x906290

:4:command.cpp :259 : 802921524683 us: command is enqueued: 0x90c980

:4:command.cpp :192 : 802921524687 us: waiting for event 0x905f00 to complete, current status 2

:4:commandqueue.cpp :165 : 802921524693 us: command is submitted: 0x90c980

:4:rocvirtual.cpp :605 : 802921524697 us: [7fffee7b8700] HWq=0x7ffff7ea2000, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7fffee75eb00

:0:rocdevice.cpp :2158: 802921526439 us: Device::callbackQueue aborting with status: 0x29

Currently we do not test the MPI implementation on AMD GPUs because we have no GPU-aware MPI available on the testing machine.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GridTools/gridtools/issues/1576#issuecomment-706207804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOI62WLR7OZVFO747JZAXULSJ4LJ5ANCNFSM4SJF5TOQ .

fthaler commented 4 years ago

Hi Yaoming,

thanks for the information about building GPU-aware OpenMPI, that’s helpful. Do you still face problems with the MPI tests? I did not yet find the time to try them on our side. I neither was able to debug the problem with the two tests as rocgdb does currently not run as it asks for a firmware update (should be available soon, the sysadmin is working on it).

On the other hand, the fix for the OpenMP problem (#1578) was merged into master. I hope this works for you, too.

Further, we discovered another problem during compilation in debug mode. HIPCC never terminates during the compilation of one test, namely test_boundary_conditions_gpu. There are no issues in release mode with this test, neither at compile time nor at run time.

Best regards, Felix

yaomingamd commented 3 years ago

Hi Felix, I tested newest master and compilation errors disappear, but compilation hanged for two to three ctests even build for release, I guess issues with clang compiler. 173 ctest failed and 3 not run. I did not have time to build gridtool with mpi yet and will update you.

Yaoming

fthaler commented 3 years ago

Hi Yaoming,

interesting, as in our Jenkins setup all non-MPI tests run fine except horizontal_diffusion_functions_gpu and lap_gpu. I assume we have the official release of ROCm 3.8, but this is the output of hipcc --version anway:

HIP version: 3.8.20371-d1886b0b
clang version 11.0.0 (/data/jenkins_workspace/centos_pipeline_job_3.8/rocm-rel-3.8/rocm-3.8-30-20200915/7.7/external/llvm-project/clang b98349b12ffa706d0e863a3f1176b20d2a6c438b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-3.8.0/llvm/bin

What are the reported reasons for the test failures on your system? We also got the firmware update and rocgdb is working again, but it did not really help for debugging the issue on the two failing tests.

Cheers, Felix

yaomingamd commented 3 years ago

Yes, I am using official release rocm3.8 and tested inside docker container. Container was built based on ubuntu:18.04, then install rocm-dkms and build boost-1.74. compiler hang when build boundary_conditions.cpp and test_boundary_conditions.cpp My cmake configure is following

BUILD_GMOCK ON BUILD_TESTING ON Boost_INCLUDE_DIR /usr/local/boost_1_74_0/include CMAKE_BUILD_TYPE CMAKE_INSTALL_PREFIX /usr/local CUDAToolkit_NVCC_EXECUTABLE CUDAToolkit_NVCC_EXECUTABLE-NOTFOUND FETCHCONTENT_BASE_DIR /home/projects/GridTools/gridtools/build/_deps FETCHCONTENT_FULLY_DISCONNECTE OFF FETCHCONTENT_QUIET ON FETCHCONTENT_SOURCE_DIR_CPP_BI FETCHCONTENT_SOURCE_DIR_GOOGLE FETCHCONTENT_UPDATES_DISCONNEC OFF FETCHCONTENT_UPDATES_DISCONNEC OFF FETCHCONTENT_UPDATES_DISCONNEC OFF GT_CLANG_CUDA_MODE AUTO GT_CUDA_ARCH GT_ENABLE_BINDINGS_GENERATION OFF GT_INSTALL_EXAMPLES OFF GT_TESTS_CXX_STANDARD DEFAULT GT_TESTS_REQUIRE_C_COMPILER OFF GT_TESTS_REQUIRE_FORTRAN_COMPI OFF GT_TESTS_REQUIRE_GPU ON GT_TESTS_REQUIRE_OpenMP OFF HPX_DIR HPX_DIR-NOTFOUND nlohmann_json_DIR nlohmann_json_DIR-NOTFOUND

Output of CTEST:45% tests passed, 168 tests failed out of 303

Label Time Summary: cartesian = 2.75 secproc (100 tests) cmake = 130.57 secproc (1 test) cpu = 0.00 secproc (2 tests) cpu_ifirst = 0.13 secproc (39 tests) cpu_kfirst = 0.13 secproc (39 tests) cuda = 2.51 secproc (6 tests) getting_started = 2.57 secproc (11 tests) gpu = 9.44 secproc (55 tests) gpu_horizontal = 0.51 secproc (34 tests) icosahedral = 0.07 secproc (70 tests) naive = 0.11 secproc (34 tests) regression = 0.16 secproc (149 tests) storage = 2.77 secproc (17 tests) unit_test = 11.57 secproc (98 tests)

Total Test time (real) = 159.99 sec

The following tests FAILED: 44 - horizontal_diffusion_naive (Failed) 45 - horizontal_diffusion_gpu (Failed) 46 - horizontal_diffusion_gpu_horizontal (Failed) 47 - horizontal_diffusion_cpu_kfirst (Failed) 48 - horizontal_diffusion_cpu_ifirst (Failed) 49 - horizontal_diffusion_fused_naive (Failed) 50 - horizontal_diffusion_fused_gpu (Failed) 51 - horizontal_diffusion_fused_gpu_horizontal (Failed) 52 - horizontal_diffusion_fused_cpu_kfirst (Failed) 53 - horizontal_diffusion_fused_cpu_ifirst (Failed) 54 - simple_hori_diff_naive (Failed) 55 - simple_hori_diff_gpu (Failed) 56 - simple_hori_diff_gpu_horizontal (Failed) 57 - simple_hori_diff_cpu_kfirst (Failed) 58 - simple_hori_diff_cpu_ifirst (Failed) 59 - copy_stencil_naive (Failed) 60 - copy_stencil_gpu (Failed) 61 - copy_stencil_gpu_horizontal (Failed) 62 - copy_stencil_cpu_kfirst (Failed) 63 - copy_stencil_cpu_ifirst (Failed) 64 - vertical_advection_dycore_naive (Failed) 65 - vertical_advection_dycore_gpu (Failed) 66 - vertical_advection_dycore_gpu_horizontal (Failed) 67 - vertical_advection_dycore_cpu_kfirst (Failed) 68 - vertical_advection_dycore_cpu_ifirst (Failed) 69 - advection_pdbott_prepare_tracers_naive (Failed) 70 - advection_pdbott_prepare_tracers_gpu (Failed) 71 - advection_pdbott_prepare_tracers_gpu_horizontal (Failed) 72 - advection_pdbott_prepare_tracers_cpu_kfirst (Failed) 73 - advection_pdbott_prepare_tracers_cpu_ifirst (Failed) 74 - laplacian_naive (Failed) 75 - laplacian_gpu (Failed) 76 - laplacian_gpu_horizontal (Failed) 77 - laplacian_cpu_kfirst (Failed) 78 - laplacian_cpu_ifirst (Failed) 79 - positional_stencil_naive (Failed) 80 - positional_stencil_gpu (Failed) 81 - positional_stencil_gpu_horizontal (Failed) 82 - positional_stencil_cpu_kfirst (Failed) 83 - positional_stencil_cpu_ifirst (Failed) 84 - tridiagonal_naive (Failed) 85 - tridiagonal_gpu (Failed) 86 - tridiagonal_gpu_horizontal (Failed) 87 - tridiagonal_cpu_kfirst (Failed) 88 - tridiagonal_cpu_ifirst (Failed) 89 - alignment_naive (Failed) 90 - alignment_gpu (Failed) 91 - alignment_gpu_horizontal (Failed) 92 - alignment_cpu_kfirst (Failed) 93 - alignment_cpu_ifirst (Failed) 94 - extended_4D_naive (Failed) 95 - extended_4D_gpu (Failed) 96 - extended_4D_gpu_horizontal (Failed) 97 - extended_4D_cpu_kfirst (Failed) 98 - extended_4D_cpu_ifirst (Failed) 99 - expandable_parameters_naive (Failed) 100 - expandable_parameters_gpu (Failed) 101 - expandable_parameters_gpu_horizontal (Failed) 102 - expandable_parameters_cpu_kfirst (Failed) 103 - expandable_parameters_cpu_ifirst (Failed) 104 - expandable_parameters_single_kernel_naive (Failed) 105 - expandable_parameters_single_kernel_gpu (Failed) 106 - expandable_parameters_single_kernel_gpu_horizontal (Failed) 107 - expandable_parameters_single_kernel_cpu_kfirst (Failed) 108 - expandable_parameters_single_kernel_cpu_ifirst (Failed) 109 - horizontal_diffusion_functions_naive (Failed) 110 - horizontal_diffusion_functions_gpu (Failed) 111 - horizontal_diffusion_functions_gpu_horizontal (Failed) 112 - horizontal_diffusion_functions_cpu_kfirst (Failed) 113 - horizontal_diffusion_functions_cpu_ifirst (Failed) 114 - whole_axis_access_naive (Failed) 115 - whole_axis_access_gpu (Failed) 116 - whole_axis_access_gpu_horizontal (Failed) 117 - whole_axis_access_cpu_kfirst (Failed) 118 - whole_axis_access_cpu_ifirst (Failed) 119 - layout_transformation_test_gpu (Failed) 120 - layout_transformation_test_cpu (Failed) 121 - boundary_conditions_gpu (Not Run) 122 - boundary_conditions_cpu (Failed) 124 - stencil_on_edges_multiplefields_naive (Failed) 125 - stencil_on_edges_multiplefields_gpu (Failed) 126 - stencil_on_edges_multiplefields_gpu_horizontal (Failed) 127 - stencil_on_edges_multiplefields_cpu_kfirst (Failed) 128 - stencil_on_edges_multiplefields_cpu_ifirst (Failed) 129 - stencil_on_cells_naive (Failed) 130 - stencil_on_cells_gpu (Failed) 131 - stencil_on_cells_gpu_horizontal (Failed) 132 - stencil_on_cells_cpu_kfirst (Failed) 133 - stencil_on_cells_cpu_ifirst (Failed) 134 - stencil_on_neighcell_of_edges_naive (Failed) 135 - stencil_on_neighcell_of_edges_gpu (Failed) 136 - stencil_on_neighcell_of_edges_gpu_horizontal (Failed) 137 - stencil_on_neighcell_of_edges_cpu_kfirst (Failed) 138 - stencil_on_neighcell_of_edges_cpu_ifirst (Failed) 139 - stencil_manual_fold_naive (Failed) 140 - stencil_manual_fold_gpu (Failed) 141 - stencil_manual_fold_gpu_horizontal (Failed) 142 - stencil_manual_fold_cpu_kfirst (Failed) 143 - stencil_manual_fold_cpu_ifirst (Failed) 144 - copy_stencil_icosahedral_naive (Failed) 145 - copy_stencil_icosahedral_gpu (Failed) 146 - copy_stencil_icosahedral_gpu_horizontal (Failed) 147 - copy_stencil_icosahedral_cpu_kfirst (Failed) 148 - copy_stencil_icosahedral_cpu_ifirst (Failed) 149 - expandable_parameters_icosahedral_naive (Failed) 150 - expandable_parameters_icosahedral_gpu (Failed) 151 - expandable_parameters_icosahedral_gpu_horizontal (Failed) 152 - expandable_parameters_icosahedral_cpu_kfirst (Failed) 153 - expandable_parameters_icosahedral_cpu_ifirst (Failed) 154 - stencil_on_cells_color_naive (Failed) 155 - stencil_on_cells_color_gpu (Failed) 156 - stencil_on_cells_color_gpu_horizontal (Failed) 157 - stencil_on_cells_color_cpu_kfirst (Failed) 158 - stencil_on_cells_color_cpu_ifirst (Failed) 159 - stencil_on_edges_naive (Failed) 160 - stencil_on_edges_gpu (Failed) 161 - stencil_on_edges_gpu_horizontal (Failed) 162 - stencil_on_edges_cpu_kfirst (Failed) 163 - stencil_on_edges_cpu_ifirst (Failed) 164 - stencil_fused_naive (Failed) 165 - stencil_fused_gpu (Failed) 166 - stencil_fused_gpu_horizontal (Failed) 167 - stencil_fused_cpu_kfirst (Failed) 168 - stencil_fused_cpu_ifirst (Failed) 169 - stencil_on_neighedge_of_cells_naive (Failed) 170 - stencil_on_neighedge_of_cells_gpu (Failed) 171 - stencil_on_neighedge_of_cells_gpu_horizontal (Failed) 172 - stencil_on_neighedge_of_cells_cpu_kfirst (Failed) 173 - stencil_on_neighedge_of_cells_cpu_ifirst (Failed) 174 - stencil_on_vertices_naive (Failed) 175 - stencil_on_vertices_gpu (Failed) 176 - stencil_on_vertices_gpu_horizontal (Failed) 177 - stencil_on_vertices_cpu_kfirst (Failed) 178 - stencil_on_vertices_cpu_ifirst (Failed) 179 - curl_naive (Failed) 180 - curl_gpu (Failed) 181 - curl_gpu_horizontal (Failed) 182 - curl_cpu_kfirst (Failed) 183 - curl_cpu_ifirst (Failed) 184 - div_naive (Failed) 185 - div_gpu (Failed) 186 - div_gpu_horizontal (Failed) 187 - div_cpu_kfirst (Failed) 188 - div_cpu_ifirst (Failed) 189 - lap_naive (Failed) 190 - lap_gpu (Failed) 191 - lap_gpu_horizontal (Failed) 192 - lap_cpu_kfirst (Failed) 193 - lap_cpu_ifirst (Failed) 224 - test_boundary_conditions_gpu (Not Run) 225 - test_boundary_conditions_cpu (Failed) 241 - test_kcache_fill_cpu_kfirst (Failed) 242 - test_kcache_fill_cpu_ifirst (Failed) 246 - test_kcache_fill_and_flush_cpu_kfirst (Failed) 247 - test_kcache_fill_and_flush_cpu_ifirst (Failed) 251 - test_kcache_flush_cpu_kfirst (Failed) 252 - test_kcache_flush_cpu_ifirst (Failed) 256 - test_kcache_local_cpu_kfirst (Failed) 257 - test_kcache_local_cpu_ifirst (Failed) 261 - test_kparallel_cpu_kfirst (Failed) 262 - test_kparallel_cpu_ifirst (Failed) 271 - test_tmp_storage_sid_cpu_ifirst (Failed) 291 - test_layout_transformation (Failed) 292 - cmaketest_storage_gpu (Failed) 296 - getting_started_test_gt_laplacian (Failed) 297 - getting_started_test_gt_smoothing_variant1 (Failed) 298 - getting_started_test_gt_smoothing_variant2 (Failed) 299 - getting_started_test_gt_smoothing_variant3 (Failed)

Is there any issue with my cmake configure to build gridtool CTEST?

fthaler commented 3 years ago

Hmm, those are the same hangs we get in debug mode. CMAKE_BUILD_TYPE seems to be unset in your config, maybe try to set it explicitly using cmake -DCMAKE_BUILD_TYPE=Release and see if the compiler hangs disappear. I am surprised that the CPU tests also fail. Could you please check the output of ctest --output-on-failure or of a single test (e.g., run the ./tests/regression/copy_stencil_* executables in the build directory)?

yaomingamd commented 3 years ago

Thanks! Yes, compiler hangs disappear after changed to release mode. 2. Most of test failures are due to LD_LIBRARY_PATH issue. Now CTEST only has three failures. 110 - horizontal_diffusion_functions_gpu (Child aborted) 190 - lap_gpu (Child aborted) 292 - cmaketest_storage_gpu (Failed) I will try to help finding the cause.

yaomingamd commented 3 years ago

The following failures are due to bugs of ROCM3.8 and fixed in next release of ROCM. 110 - horizontal_diffusion_functions_gpu (Child aborted) 190 - lap_gpu (Child aborted)

fthaler commented 3 years ago

That’s great to hear, many thanks for your investigations! Have you also found the reasons for the compiler hangs during compilation in debug mode?

yaomingamd commented 3 years ago

ROCM3.9.0 just release today. The failures of two ctests have been fixed. But compilation hangs when build in debug mode has not been fixed yet and will notify when it is fixed. The nlsext step will be GPU-aware OPENMPI. Are all source codes which implement GCL under include/GCL sub-folder?

yaomingamd commented 3 years ago

get the error when built ctest with gpu-aware openmpi [ 31%] Building CXX object tests/regression/CMakeFiles/copy_stencil_parallel_cpu.dir/copy_stencil_parallel.cpp.o In file included from /home/projects/GridTools/gridtools2_1/tests/regression/copy_stencil_parallel.cpp:28: In file included from /home/projects/GridTools/gridtools2_1/tests/include/stencil_select.hpp:24: In file included from /home/projects/GridTools/gridtools2_1/include/gridtools/stencil/cpu_kfirst.hpp:31: In file included from /home/projects/GridTools/gridtools2_1/include/gridtools/stencil/../thread_pool/omp.hpp:14: /opt/rocm-3.9.0/llvm/lib/clang/12.0.0/include/omp.h:67:42: error: declaration of 'omp_get_max_threads' has a different language linkage extern int __KAI_KMPC_CONVENTION omp_get_max_threads (void); ^ /home/projects/GridTools/gridtools2_1/include/gridtools/common/timer/../omp.hpp:16:12: note: previous definition is here inline int omp_get_max_threads() { return 1; } ^ In file included from /home/projects/GridTools/gridtools2_1/tests/regression/copy_stencil_parallel.cpp:28: In file included from /home/projects/GridTools/gridtools2_1/tests/include/stencil_select.hpp:24: In file included from /home/projects/GridTools/gridtools2_1/include/gridtools/stencil/cpu_kfirst.hpp:31: In file included from /home/projects/GridTools/gridtools2_1/include/gridtools/stencil/../thread_pool/omp.hpp:14: /opt/rocm-3.9.0/llvm/lib/clang/12.0.0/include/omp.h:68:42: error: declaration of 'omp_get_thread_num' has a different language linkage extern int KAI_KMPC_CONVENTION omp_get_thread_num (void); ^ /home/projects/GridTools/gridtools2_1/include/gridtools/common/timer/../omp.hpp:15:12: note: previous definition is here inline int omp_get_thread_num() { return 0; } ^ In file included from /home/projects/GridTools/gridtools2_1/tests/regression/copy_stencil_parallel.cpp:28: In file included from /home/projects/GridTools/gridtools2_1/tests/include/stencil_select.hpp:24: In file included from /home/projects/GridTools/gridtools2_1/include/gridtools/stencil/cpu_kfirst.hpp:31: In file included from /home/projects/GridTools/gridtools2_1/include/gridtools/stencil/../thread_pool/omp.hpp:14: /opt/rocm-3.9.0/llvm/lib/clang/12.0.0/include/omp.h:128:42: error: declaration of 'omp_get_wtime' has a different language linkage extern double KAI_KMPC_CONVENTION omp_get_wtime (void); ^ /home/projects/GridTools/gridtools2_1/include/gridtools/common/timer/../omp.hpp:17:15: note: previous definition is here inline double omp_get_wtime() { return 0; } ^ 3 errors generated when compiling for gfx906. make[2]: [tests/regression/CMakeFiles/copy_stencil_parallel_cpu.dir/build.make:82: tests/regression/CMakeFiles/copy_stencil_parallel_cpu.dir/copy_stencil_parallel.cpp.o] Error 1 make[1]: [CMakeFiles/Makefile2:6211: tests/regression/CMakeFiles/copy_stencil_parallel_cpu.dir/all] Error 2 make: *** [Makefile:160: all] Error 2

havogt commented 4 months ago

If there are still issues, please re-open with newer compiler version as reference.