Closed terU3760 closed 2 years ago
Have also done some hackers on the two files: hip_cooperative_groups.h and hip_cooperative_groups_helper.h in the directory "DeepSpeed/csrc/includes/patch/hip/hcc_detail". But still reports some error as:
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:34:38: error: no member named 'cg_coalesced_tile' in namespace 'cooperative_groups::internal'
cg::thread_group g(cg::internal::cg_coalesced_tile, WARP_SIZE);
~~~~~~~~~~~~~~^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:35:7: error: no member named 'tiled_partition' in 'cooperative_groups::thread_group'
g.tiled_partition(b, WARP_SIZE);
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:60:48: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
for (int i = 1; i < 32; i *= 2) { sum += g.shfl_down(sum, i); }
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:72:69: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
for (int i = 1; i < (iteration_stride >> 5); i *= 2) { sum += g.shfl_down(sum, i); }
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:74:13: error: no member named 'shfl' in 'cooperative_groups::thread_group'
sum = g.shfl(sum, 0);
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:84:53: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
for (int i = 1; i < 32; i *= 2) { variance += g.shfl_down(variance, i); }
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:96:74: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
for (int i = 1; i < (iteration_stride >> 5); i *= 2) { variance += g.shfl_down(variance, i); }
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:97:18: error: no member named 'shfl' in 'cooperative_groups::thread_group'
variance = g.shfl(variance, 0);
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:324:38: error: no member named 'cg_coalesced_tile' in namespace 'cooperative_groups::internal'
cg::thread_group g(cg::internal::cg_coalesced_tile, 32);
~~~~~~~~~~~~~~^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:325:7: error: no member named 'tiled_partition' in 'cooperative_groups::thread_group'
g.tiled_partition(b, 32);
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:350:48: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
for (int i = 1; i < 32; i *= 2) { sum += g.shfl_down(sum, i); }
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:362:69: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
for (int i = 1; i < (iteration_stride >> 5); i *= 2) { sum += g.shfl_down(sum, i); }
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:364:13: error: no member named 'shfl' in 'cooperative_groups::thread_group'
sum = g.shfl(sum, 0);
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:372:53: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
for (int i = 1; i < 32; i *= 2) { variance += g.shfl_down(variance, i); }
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:384:74: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
for (int i = 1; i < (iteration_stride >> 5); i *= 2) { variance += g.shfl_down(variance, i); }
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:385:18: error: no member named 'shfl' in 'cooperative_groups::thread_group'
variance = g.shfl(variance, 0);
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:632:38: error: no member named 'cg_coalesced_tile' in namespace 'cooperative_groups::internal'
cg::thread_group g(cg::internal::cg_coalesced_tile, TILE_DIM);
~~~~~~~~~~~~~~^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:633:7: error: no member named 'tiled_partition' in 'cooperative_groups::thread_group'
g.tiled_partition(b, TILE_DIM);
~ ^
/******/DeepSpeed/csrc/transformer/normalize_kernels.hip:669:17: error: no member named 'shfl_down' in 'cooperative_groups::thread_group'
s1 += g.shfl_down(s1, i);
~ ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx803.
error: command '/opt/rocm-4.2.0/bin/hipcc' failed with exit status 1
Error on line 155
Fail to install deepspeed
Hi, sorry we missed this. You are correct that the cooperative_groups headers need some hacks to work, so we'd recommend using the following Dockerfile to build DeepSpeed: https://github.com/ROCmSoftwarePlatform/DeepSpeed/blob/master/docker/Dockerfile.rocm
On my platform when run the command:
/opt/rocm/bin/rocminfo
, it outputs:. When input the command:
/opt/rocm/opencl/bin/clinfo
, it outputs:. After running the following commands:
It starts building and reported the following error:
What is the cause and how to fix it?