CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
166 stars 27 forks source link

Rebase HIP 6.x + Update hip-tests #796

Closed pvelesko closed 3 months ago

pvelesko commented 4 months ago

Update HIP version from 5.1 to 6.x

uint32_t __device__ __ockl_multi_grid_num_grids() {return 0;}; 
uint32_t __device__ __ockl_multi_grid_grid_rank() {return 0;}; 
uint32_t __device__ __ockl_multi_grid_size() {return 0;}; 
uint32_t __device__ __ockl_multi_grid_thread_rank() {return 0;}; 
uint32_t __device__ __ockl_multi_grid_is_valid() {return 0;}; 
uint32_t __device__ __ockl_multi_grid_sync() {return 0;}; 
uint32_t __device__ __ockl_grid_sync() {return 0;}; 
uint32_t __device__ __ockl_grid_is_valid() {return 0;}; 
void __device__ __builtin_amdgcn_fence(int, const char*){};
unsigned int __device__ __builtin_amdgcn_mbcnt_lo(unsigned int, unsigned int){return 0;};
unsigned int __device__ __builtin_amdgcn_read_exec(void){return 0;};

Update hip-tests submodule

Update HIPCC

The following individual tests are still excluded, mostly due to failures in LLVM-SPIRV-Translator

  coalesced_group.cc
  coalesced_groups_shfl_down_old.cc
  coalesced_groups_shfl_up_old.cc
  coalesced_group_tiled_partition.cc

The following test categores are excluded due to unsuported features.

Occupancy

# failure in template argument resolution

Surface - not implemented:

surf1DLayeredread
surf1Dread
surf2DLayeredread
surf2Dread
surf3Dread
surfCubemapread

Texture - not implemented:

tex1DGrad
tex1DLayered
tex1DLayeredGrad
tex1DLayeredLod
tex1DLod
tex3D

One final caveat: test discovery happens at runtime, every single time.. This means that to run a single test, we need to discover all the tests first. Adds 70 seconds overhear for launching a single test. Total number of tests 975 -> 1898

pvelesko commented 4 months ago

Seeing a lot of these as well as a lot more test failures even some basic tests such as Unit_hipMemsetSync

863: CHIP warning [TID 3650729] [1710422994.385196816] : A device function is already registered and mapped to a different module.

@linehill

linehill commented 3 months ago
863: CHIP warning [TID 3650729] [1710422994.385196816] : A device function is already registered and mapped to a different module.

Probably just a false positive. I wrote the warning at a time I didn’t fully understood what was happening (__hipRegisterFuntcion() called multiple times with same host pointer). I made a patch to remove it: https://github.com/CHIP-SPV/chipStar/pull/809.

linehill commented 3 months ago

FYI, building this branch from scratch with multiple jobs may end up in a situation where hipcc is not built before building hip-tests:

...
cd /mnt/md1/linehill/ws-chipstar-2/builds/chipstar/catch/catch_tests/unit/module && ../../../../bin/hipcc --genco --std=c++17 /mnt/md1/linehill/ws-chipstar-2/chipstar/hip-tests/catch/unit/module/get_function_module.cc -o get_function_module.code -I/mnt/md1/linehill/ws-chipstar-2/builds/chipstar/include/ --rocm-path=/opt/rocm
/bin/sh: 1: ../../../../bin/hipcc: not found
catch/catch_tests/unit/module/CMakeFiles/get_function_module.dir/build.make:72: recipe for target 'catch/catch_tests/unit/module/get_function_module.code' failed
make[2]: *** [catch/catch_tests/unit/module/get_function_module.code] Error 127
...
linehill commented 3 months ago
/mnt/md1/linehill/ws-chipstar-2/chipstar/include/hip/devicelib/type_casting_intrinsics.hh:32:36: warning: inline function '__double2hiint' is not defined [-Wundefined-inline]
extern "C++" inline __device__ int __double2hiint(double x);

Removing the inline suppresses the warning. The inline qualifier is only meaningful for function definitions.

pvelesko commented 3 months ago

Please run clang-format for the whole patch set. E.g. with git clang-format main.

done

Did you check if there are new hipDeviceProp_t and hipDeviceAttribute_t entries and updated hipGetDeviceProperties() and hipDeviceGetAttribute() to respond to them?

Yes there changes to these but since we decided to stick with the old tests for the time being it doesn't make sense to update these and have two sets of broken unit tests.