sunpian1 commented 11 months ago

susie.sun@yz-amd1:~$ docker run -it rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed /bin/bash root@c50e90963e1a:/var/lib/jenkins# deepspeed --num_gpus 1 deploy.py [2023-12-14 01:52:04,385] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-14 01:52:05,180] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. Traceback (most recent call last): File "/opt/conda/envs/py_3.9/bin/deepspeed", line 6, in main() File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 422, in main raise RuntimeError("Unable to proceed, no GPU resources available") RuntimeError: Unable to proceed, no GPU resources available

our AMD gpu is AMD Radeon™ RX 7900 XTX

rraminen commented 11 months ago

Hi @sunpian1, could you please give some information regarding the contents in deploy.py?

sunpian1 commented 11 months ago

I solved the issue. When i continue, I get the other error like this.

root@c8be2f1a010b:/var/lib/jenkins# python test.py [2023-12-25 06:42:04,395] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.12G/1.12G [03:08<00:00, 5.93MB/s] tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 222/222 [00:00<00:00, 24.7kB/s] tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14.5M/14.5M [00:02<00:00, 5.54MB/s] special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 85.0/85.0 [00:00<00:00, 13.0kB/s] [2023-12-25 06:45:33,972] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+4199dc25, git-hash=4199dc25, git-branch=master [2023-12-25 06:45:33,973] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2023-12-25 06:45:33,973] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py39_cpu/transformer_inference... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer_hip.h [ok] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim_hip.h [skipped, already hipified] Successfully preprocessed all matching files. Total number of unsupported CUDA function calls: 0

Total number of replaced kernel launches: 25 Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py39_cpu/transformer_inference/build.ninja... Building extension module transformer_inference... Using envvar MAX_JOBS (32) as the number of workers... [1/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip -o rms_norm.cuda.o FAILED: rms_norm.cuda.o /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip -o rms_norm.cuda.o In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:8: In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9: In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24: In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38: /opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:638:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tile size is either not a power of 2 or greater than the wavefront size static_assert(is_valid_tile_size::value, ^ ~~~~~~~ /opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:704:55: note: in instantiation of template class 'cooperative_groups::thread_block_tile_base<64>' requested here class thread_block_tile_type<tileSize, void> : public thread_block_tile_base, ^ /opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:781:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile_type<64, void>' requested here class thread_block_tile_internal : public thread_block_tile_type<size, ParentCGTy> { ^ /opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:807:46: note: in instantiation of template class 'cooperative_groups::impl::thread_block_tile_internal<64, void>' requested here class thread_block_tile<size, void> : public impl::thread_block_tile_internal<size, void> { ^ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:362:44: note: in instantiation of template class 'cooperative_groups::thread_block_tile<64>' requested here data[0] = element(data[0], warp.shfl_xor(data[0], i)); ^ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:371:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>' data[0] = element(data[0], warp.shfl_xor(data[0], i));


/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:372:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:381:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:382:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:383:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[2] = element<Op3>(data[2], warp.shfl_xor(data[2], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:392:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:393:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:394:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[2] = element<Op3>(data[2], warp.shfl_xor(data[2], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:395:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[3] = element<Op4>(data[3], warp.shfl_xor(data[3], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:431:22: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
        if (warp_arg.thread_rank() == 0) {
            ~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:442:26: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
            if (warp_arg.thread_rank() < running_warps) {
                ~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:446:68: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
                        data + i, reduce_buffer + elems * warp_arg.thread_rank() + i);
                                                          ~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:456:82: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
                mem_access::store_shared<bytes>(reduce_buffer + elems * warp_arg.thread_rank() + i,
                                                                        ~~~~~~~~ ^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:837:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tiled partition with size > wavefront size. Currently not supported
  static_assert(is_valid_tile_size<size>::value,
  ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:98:52: note: in instantiation of function template specialization 'cooperative_groups::tiled_partition<64U, cooperative_groups::thread_block>' requested here
    cg::thread_block_tile<hw_warp_size> warp = cg::tiled_partition<hw_warp_size>(tb);
                                                   ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:224:13: note: in instantiation of function template specialization 'pre_rms_norm<float, 1, 1, 256>' requested here
            LAUNCH_ALL_RMS_NORM(1, 1, maxThreads);
            ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:179:9: note: expanded from macro 'LAUNCH_ALL_RMS_NORM'
        LAUNCH_PRE_RMS_NORM(UNROLL, threadsPerGroup, maxThreads) \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:174:25: note: expanded from macro 'LAUNCH_PRE_RMS_NORM'
   hipLaunchKernelGGL(( pre_rms_norm<T, UNROLL, threadsPerGroup, maxThreads>), dim3(grid), dim3(block), 0, stream,  \
                        ^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:814:9: error: type 'impl::thread_block_tile_internal<64U, void>' is not a direct or virtual base of 'cooperative_groups::thread_block_tile<64>'
      : impl::thread_block_tile_internal<size, void>(g) {}
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:801:12: note: in instantiation of function template specialization 'cooperative_groups::thread_block_tile<64>::thread_block_tile<cooperative_groups::thread_block>' requested here
    return thread_block_tile<size, void>(*this);
           ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:98:48: note: in instantiation of member function 'cooperative_groups::thread_block_tile<64, cooperative_groups::thread_block>::operator thread_block_tile' requested here
    cg::thread_block_tile<hw_warp_size> warp = cg::tiled_partition<hw_warp_size>(tb);
                                               ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:224:13: note: in instantiation of function template specialization 'pre_rms_norm<float, 1, 1, 256>' requested here
            LAUNCH_ALL_RMS_NORM(1, 1, maxThreads);
            ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:179:9: note: expanded from macro 'LAUNCH_ALL_RMS_NORM'
        LAUNCH_PRE_RMS_NORM(UNROLL, threadsPerGroup, maxThreads) \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:174:25: note: expanded from macro 'LAUNCH_PRE_RMS_NORM'
   hipLaunchKernelGGL(( pre_rms_norm<T, UNROLL, threadsPerGroup, maxThreads>), dim3(grid), dim3(block), 0, stream,  \
                        ^
16 errors generated when compiling for gfx1030.
[2/11] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip -o apply_rotary_pos_emb.cuda.o
FAILED: apply_rotary_pos_emb.cuda.o
/opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip -o apply_rotary_pos_emb.cuda.o
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:638:3: **error**: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tile size is either not a power of 2 or greater than the wavefront size
  static_assert(is_valid_tile_size<size>::value,
  ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:691:39: note: in instantiation of template class 'cooperative_groups::thread_block_tile_base<64>' requested here
class thread_block_tile_type : public thread_block_tile_base<tileSize>,
                                      ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:781:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile_type<64, cooperative_groups::thread_block>' requested here
class thread_block_tile_internal : public thread_block_tile_type<size, ParentCGTy> {
                                          ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:794:34: note: in instantiation of template class 'cooperative_groups::impl::thread_block_tile_internal<64, cooperative_groups::thread_block>' requested here
class thread_block_tile : public impl::thread_block_tile_internal<size, ParentCGTy> {
                                 ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: note: in instantiation of template class 'cooperative_groups::thread_block_tile<64, cooperative_groups::thread_block>' requested here
    cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
                                                       ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:174:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 4>' requested here
        LAUNCH_FOR_ALIGNMENT(4);
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
        LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
   hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
                        ^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:837:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tiled partition with size > wavefront size. Currently not supported
  static_assert(is_valid_tile_size<size>::value,
  ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:60: note: in instantiation of function template specialization 'cooperative_groups::tiled_partition<64U, cooperative_groups::thread_block>' requested here
    cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
                                                           ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:174:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 4>' requested here
        LAUNCH_FOR_ALIGNMENT(4);
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
        LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
   hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
                        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: error: no matching function for call to 'tiled_partition'
    cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
                                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:176:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 8>' requested here
        LAUNCH_FOR_ALIGNMENT(8);
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
        LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
   hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
                        ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:836:54: note: candidate template ignored: substitution failure [with size = 64, ParentCGTy = cg::thread_block]
__CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
                                                     ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:723:18: error: no member named 'sync' in 'cooperative_groups::thread_block_tile_base<64>'
  using tbtBase::sync;
        ~~~~~~~~~^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:781:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile_type<64, void>' requested here
class thread_block_tile_internal : public thread_block_tile_type<size, ParentCGTy> {
                                          ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:807:46: note: in instantiation of template class 'cooperative_groups::impl::thread_block_tile_internal<64, void>' requested here
class thread_block_tile<size, void> : public impl::thread_block_tile_internal<size, void> {
                                             ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile<64>' requested here
    cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
                                          ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:176:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 8>' requested here
        LAUNCH_FOR_ALIGNMENT(8);
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
        LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
   hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
                        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: error: no matching function for call to 'tiled_partition'
    cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
                                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:178:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 16>' requested here
        LAUNCH_FOR_ALIGNMENT(16);
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
        LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
   hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
                        ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:836:54: note: candidate template ignored: substitution failure [with size = 64, ParentCGTy = cg::thread_block]
__CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
                                                     ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:56:48: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
        const int base_neuron_idx = head_group.thread_rank() * T_per_thread;
                                    ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:74:58: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
                                            ? head_group.thread_rank() + half_dim_threads
                                              ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:77:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
                const float q_rot_temp = head_group.shfl(q_rot, target_lane);
                                         ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:78:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
                const float k_rot_temp = head_group.shfl(k_rot, target_lane);
                                         ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: error: no matching function for call to 'tiled_partition'
    cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
                                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:174:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<__half, 64, 4>' requested here
        LAUNCH_FOR_ALIGNMENT(4);
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
        LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
   hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
                        ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:836:54: note: candidate template ignored: substitution failure [with size = 64, ParentCGTy = cg::thread_block]
__CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
                                                     ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:56:48: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
        const int base_neuron_idx = head_group.thread_rank() * T_per_thread;
                                    ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:74:58: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
                                            ? head_group.thread_rank() + half_dim_threads
                                              ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:77:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
                const float q_rot_temp = head_group.shfl(q_rot, target_lane);
                                         ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:78:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
                const float k_rot_temp = head_group.shfl(k_rot, target_lane);
                                         ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: error: no matching function for call to 'tiled_partition'
    cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
                                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:176:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<__half, 64, 8>' requested here
        LAUNCH_FOR_ALIGNMENT(8);
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
        LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
        ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
   hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
                        ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:836:54: note: candidate template ignored: substitution failure [with size = 64, ParentCGTy = cg::thread_block]
__CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
                                                     ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:56:48: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
        const int base_neuron_idx = head_group.thread_rank() * T_per_thread;
                                    ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:74:58: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
                                            ? head_group.thread_rank() + half_dim_threads
                                              ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:77:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
                const float q_rot_temp = head_group.shfl(q_rot, target_lane);
                                         ~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:78:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
                const float k_rot_temp = head_group.shfl(k_rot, target_lane);
                                         ~~~~~~~~~~ ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx1030.
[3/11] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip -o layer_norm.cuda.o
FAILED: layer_norm.cuda.o
/opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip -o layer_norm.cuda.o
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:638:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tile size is either not a power of 2 or greater than the wavefront size
  static_assert(is_valid_tile_size<size>::value,
  ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:704:55: note: in instantiation of template class 'cooperative_groups::thread_block_tile_base<64>' requested here
class thread_block_tile_type<tileSize, void> : public thread_block_tile_base<tileSize>,
                                                      ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:781:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile_type<64, void>' requested here
class thread_block_tile_internal : public thread_block_tile_type<size, ParentCGTy> {
                                          ^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:807:46: note: in instantiation of template class 'cooperative_groups::impl::thread_block_tile_internal<64, void>' requested here
class thread_block_tile<size, void> : public impl::thread_block_tile_internal<size, void> {
                                             ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:362:44: note: in instantiation of template class 'cooperative_groups::thread_block_tile<64>' requested here
        data[0] = element<Op>(data[0], warp.shfl_xor(data[0], i));
                                           ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:371:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:372:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:381:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:382:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:383:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[2] = element<Op3>(data[2], warp.shfl_xor(data[2], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:392:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:393:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:394:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[2] = element<Op3>(data[2], warp.shfl_xor(data[2], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:395:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
        data[3] = element<Op4>(data[3], warp.shfl_xor(data[3], i));
                                        ~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:431:22: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
        if (warp_arg.thread_rank() == 0) {
            ~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:442:26: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
            if (warp_arg.thread_rank() < running_warps) {
                ~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:446:68: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
                        data + i, reduce_buffer + elems * warp_arg.thread_rank() + i);
                                                          ~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:456:82: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
                mem_access::store_shared<bytes>(reduce_buffer + elems * warp_arg.thread_rank() + i,
                                                                        ~~~~~~~~ ^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:837:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tiled partition with size > wavefront size. Currently not supported
  static_assert(is_valid_tile_size<size>::value,
  ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:239:52: note: in instantiation of function template specialization 'cooperative_groups::tiled_partition<64U, cooperative_groups::thread_block>' requested here
    cg::thread_block_tile<hw_warp_size> warp = cg::tiled_partition<hw_warp_size>(tb);
                                                   ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:387:13: note: in instantiation of function template specialization 'fused_residual_ln<__half, 1, 1, 256, false>' requested here
            LAUNCH_FUSED_RES_LN(1, 1, maxThreads);
            ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:341:25: note: expanded from macro 'LAUNCH_FUSED_RES_LN'
   hipLaunchKernelGGL(( fused_residual_ln<T, unRollFactor, threadsPerGroup, maxThreads, false>) \
                        ^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:814:9: error: type 'impl::thread_block_tile_internal<64U, void>' is not a direct or virtual base of 'cooperative_groups::thread_block_tile<64>'
      : impl::thread_block_tile_internal<size, void>(g) {}
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:801:12: note: in instantiation of function template specialization 'cooperative_groups::thread_block_tile<64>::thread_block_tile<cooperative_groups::thread_block>' requested here
    return thread_block_tile<size, void>(*this);
           ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:239:48: note: in instantiation of member function 'cooperative_groups::thread_block_tile<64, cooperative_groups::thread_block>::operator thread_block_tile' requested here
    cg::thread_block_tile<hw_warp_size> warp = cg::tiled_partition<hw_warp_size>(tb);
                                               ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:387:13: note: in instantiation of function template specialization 'fused_residual_ln<__half, 1, 1, 256, false>' requested here
            LAUNCH_FUSED_RES_LN(1, 1, maxThreads);
            ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:341:25: note: expanded from macro 'LAUNCH_FUSED_RES_LN'
   hipLaunchKernelGGL(( fused_residual_ln<T, unRollFactor, threadsPerGroup, maxThreads, false>) \
                        ^
16 errors generated when compiling for gfx1030.
[4/11] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip -o dequantize.cuda.o
[5/11] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip -o relu.cuda.o
[6/11] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip -o pointwise_ops.cuda.o
[7/11] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip -o transform.cuda.o
[8/11] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip -o gelu.cuda.o
[9/11] /opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip -o softmax.cuda.o
[10/11] c++ -MMD -MF pt_binding_hip.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp -o pt_binding_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In constructor ‘InferenceContext::InferenceContext()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:77:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   77 |         hipEventCreate(&_comp1_event);
      |         ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t**)’, declared here
 2472 | hipError_t hipEventCreate(hipEvent_t* event);
      |            ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:78:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   78 |         hipEventCreate(&_comp2_event);
      |         ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t**)’, declared here
 2472 | hipError_t hipEventCreate(hipEvent_t* event);
      |            ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:79:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   79 |         hipEventCreate(&_comp_event);
      |         ~~~~~~~~~~~~~~^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t**)’, declared here
 2472 | hipError_t hipEventCreate(hipEvent_t* event);
      |            ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:80:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   80 |         hipEventCreate(&_comm_event);
      |         ~~~~~~~~~~~~~~^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t**)’, declared here
 2472 | hipError_t hipEventCreate(hipEvent_t* event);
      |            ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In destructor ‘virtual InferenceContext::~InferenceContext()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:86:16: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   86 |         hipFree(_workspace);
      |         ~~~~~~~^~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void*)’, declared here
 3562 | hipError_t hipFree(void* ptr);
      |            ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:87:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   87 |         hipEventDestroy(_comp1_event);
      |         ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
 2521 | hipError_t hipEventDestroy(hipEvent_t event);
      |            ^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:88:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   88 |         hipEventDestroy(_comp2_event);
      |         ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
 2521 | hipError_t hipEventDestroy(hipEvent_t event);
      |            ^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:89:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   89 |         hipEventDestroy(_comp_event);
      |         ~~~~~~~~~~~~~~~^~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
 2521 | hipError_t hipEventDestroy(hipEvent_t event);
      |            ^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:90:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
   90 |         hipEventDestroy(_comm_event);
      |         ~~~~~~~~~~~~~~~^~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
 2521 | hipError_t hipEventDestroy(hipEvent_t event);
      |            ^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::GenWorkSpace(const unsigned int&, const unsigned int&, const size_t&, const size_t&, const size_t&, const unsigned int&, const bool&, const size_t&, const unsigned int&, unsigned int, unsigned int)’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:112:48: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  112 |         if (!_free_memory_size) { hipMemGetInfo(&_free_memory_size, &total_size); }
      |                                   ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:4048:12: note: in call to ‘hipError_t hipMemGetInfo(size_t*, size_t*)’, declared here
 4048 | hipError_t hipMemGetInfo(size_t* free, size_t* total);
      |            ^~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:154:22: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  154 |             hipMalloc(&_workspace, workSpaceSize);
      |             ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void**, size_t)’, declared here
 2786 | hipError_t hipMalloc(void** ptr, size_t size);
      |            ^~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:156:20: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  156 |             hipFree(_workspace);
      |             ~~~~~~~^~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void*)’, declared here
 3562 | hipError_t hipFree(void* ptr);
      |            ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:157:22: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  157 |             hipMalloc(&_workspace, workSpaceSize);
      |             ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void**, size_t)’, declared here
 2786 | hipError_t hipMalloc(void** ptr, size_t size);
      |            ^~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::release_workspace()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:230:16: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  230 |         hipFree(_workspace);
      |         ~~~~~~~^~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void*)’, declared here
 3562 | hipError_t hipFree(void* ptr);
      |            ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘bool InferenceContext::retake_workspace()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:236:18: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  236 |         hipMalloc(&_workspace, _workSpaceSize);
      |         ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void**, size_t)’, declared here
 2786 | hipError_t hipMalloc(void** ptr, size_t size);
      |            ^~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::SynchComp()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:254:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  254 |         hipEventRecord(_comp_event, _comp_stream);
      |         ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2501:12: note: in call to ‘hipError_t hipEventRecord(hipEvent_t, hipStream_t)’, declared here
 2501 | hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream = NULL);
      |            ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:255:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  255 |         hipStreamWaitEvent(_comm_stream, _comp_event, 0);
      |         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2211:12: note: in call to ‘hipError_t hipStreamWaitEvent(hipStream_t, hipEvent_t, unsigned int)’, declared here
 2211 | hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags);
      |            ^~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::SynchComm()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:259:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  259 |         hipEventRecord(_comm_event, _comm_stream);
      |         ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2501:12: note: in call to ‘hipError_t hipEventRecord(hipEvent_t, hipStream_t)’, declared here
 2501 | hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream = NULL);
      |            ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:260:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
  260 |         hipStreamWaitEvent(_comp_stream, _comm_event, 0);
      |         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2211:12: note: in call to ‘hipError_t hipStreamWaitEvent(hipStream_t, hipEvent_t, unsigned int)’, declared here
 2211 | hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags);
      |            ^~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
                 from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector<at::Tensor> ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&, float) [with T = float]’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2016:5:   required from here
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  542 |                                      {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
      |                                       ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  543 |                                       k * InferenceContext::Instance().GetMaxTokenLength(),
      |                                       ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  551 |                          {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
      |                           ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  552 |                           k * InferenceContext::Instance().GetMaxTokenLength(),
      |                           ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = float]’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2016:5:   required from here
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
 1582 |         at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
      |                                                                        ^~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector<at::Tensor> ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&, float) [with T = __half]’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2017:5:   required from here
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  542 |                                      {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
      |                                       ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  543 |                                       k * InferenceContext::Instance().GetMaxTokenLength(),
      |                                       ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  551 |                          {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
      |                           ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  552 |                           k * InferenceContext::Instance().GetMaxTokenLength(),
      |                           ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = __half]’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2017:5:   required from here
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
 1582 |         at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
      |                                                                        ^~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1888, in _run_ninja_build
    subprocess.run(
  File "/opt/conda/envs/py_3.9/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '32']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/jenkins/test.py", line 14, in <module>
    generator.model = deepspeed.init_inference(generator.model,
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/__init__.py", line 342, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 160, in __init__
    self._apply_injection_policy(config)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 411, in _apply_injection_policy
    replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 339, in replace_transformer_layer
    replaced_module = replace_module(model=model,
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 583, in replace_module
    replaced_module, _ = _replace_module(model, policy, state_dict=sd)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 643, in _replace_module
    _, layer_id = _replace_module(child,
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 643, in _replace_module
    _, layer_id = _replace_module(child,
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 619, in _replace_module
    replaced_module = policies[child.__class__][0](child,
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 298, in replace_fn
    new_module = replace_with_policy(child,
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 247, in replace_with_policy
    _container.create_module()
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/containers/bloom.py", line 30, in create_module
    self.module = DeepSpeedBloomInference(_config, mp_group=self.mp_group)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_bloom.py", line 20, in __init__
    super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in __init__
    inference_module = builder.load()
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
    return self.jit_load(verbose)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 492, in jit_load
    op_module = load(name=self.name,
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1279, in load
    return _jit_compile(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1504, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1619, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1904, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'transformer_inference'
root@c8be2f1a010b:/var/lib/jenkins#

root@3c5db63db5ac:/var/lib/jenkins# cat test.py

import os
import deepspeed
import torch
from transformers import pipeline

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='bigscience/bloom-560m',
                     device=local_rank)

generator.model = deepspeed.init_inference(generator.model,
                                           mp_size=world_size,
                                           dtype=torch.float,
                                           replace_with_kernel_inject=True)

string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
    print(string)

rraminen commented 10 months ago

transformer_inference extension is not fully enabled on AMD GPU yet. But we have a workaround for the error you are running into.

vi /opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h and comment the below two asserts to proceed.

637 template <unsigned int size> class thread_block_tile_base : public tile_base<size> {
638   //static_assert(is_valid_tile_size<size>::value,
639   //              "Tile size is either not a power of 2 or greater than the wavefront size");
640   using tile_base<size>::numThreads;

836 __CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
837   //static_assert(is_valid_tile_size<size>::value,
838   //              "Tiled partition with size > wavefront size. Currently not supported ");
839   return impl::tiled_partition_internal<size, ParentCGTy>(g);
840 }

sunpian1 commented 10 months ago

hi,@rraminen, After i comment the two asserts, I proceed. But I encounter the other new errors .

root@5bacb2f1ed69:/var/lib/jenkins# python test.py [2024-01-03 02:54:44,962] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 693/693 [00:00<00:00, 74.2kB/s] model.safetensors: 82%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 920M/1.12G [02:33<00:33, 5.99MB/s] model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.12G/1.12G [03:08<00:00, 5.93MB/s] tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 222/222 [00:00<00:00, 37.4kB/s] tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14.5M/14.5M [00:02<00:00, 5.26MB/s] special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 85.0/85.0 [00:00<00:00, 12.9kB/s] [2024-01-03 03:00:50,149] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+4199dc25, git-hash=4199dc25, git-branch=master [2024-01-03 03:00:50,151] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2024-01-03 03:00:50,151] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py39_cpu/transformer_inference... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer_hip.h [ok] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim_hip.h [skipped, already hipified] Successfully preprocessed all matching files. Total number of unsupported CUDA function calls: 0

Total number of replaced kernel launches: 25 Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py39_cpu/transformer_inference/build.ninja... Building extension module transformer_inference... Using envvar MAX_JOBS (32) as the number of workers... [1/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip -o relu.cuda.o [2/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip -o dequantize.cuda.o [3/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip -o pointwise_ops.cuda.o [4/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip -o gelu.cuda.o [5/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip -o transform.cuda.o [6/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip -o rms_norm.cuda.o [7/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip -o softmax.cuda.o [8/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip -o layer_norm.cuda.o [9/11] c++ -MMD -MF pt_binding_hip.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp -o pt_binding_hip.o -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In constructor ‘InferenceContext::InferenceContext()’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:77:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 77 | hipEventCreate(&_comp1_event); | ~~~~~~^~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t)’, declared here 2472 | hipError_t hipEventCreate(hipEvent_t* event); | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:78:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 78 | hipEventCreate(&_comp2_event); | ~~~~~~^~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t*)’, declared here 2472 | hipError_t hipEventCreate(hipEvent_t event); | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:79:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 79 | hipEventCreate(&_comp_event); | ~~~~~~^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t)’, declared here 2472 | hipError_t hipEventCreate(hipEvent_t* event); | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:80:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 80 | hipEventCreate(&_comm_event); | ~~~~~~^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t)’, declared here 2472 | hipError_t hipEventCreate(hipEvent_t event); | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In destructor ‘virtual InferenceContext::~InferenceContext()’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:86:16: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 86 | hipFree(_workspace); | ~^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void)’, declared here 3562 | hipError_t hipFree(void ptr); | ^~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:87:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 87 | hipEventDestroy(_comp1_event); | ~~~^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here 2521 | hipError_t hipEventDestroy(hipEvent_t event); | ^~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:88:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 88 | hipEventDestroy(_comp2_event); | ~~~^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here 2521 | hipError_t hipEventDestroy(hipEvent_t event); | ^~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:89:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 89 | hipEventDestroy(_comp_event); | ~~~^~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here 2521 | hipError_t hipEventDestroy(hipEvent_t event); | ^~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:90:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 90 | hipEventDestroy(_comm_event); | ~~~^~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here 2521 | hipError_t hipEventDestroy(hipEvent_t event); | ^~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::GenWorkSpace(const unsigned int&, const unsigned int&, const size_t&, const size_t&, const size_t&, const unsigned int&, const bool&, const size_t&, const unsigned int&, unsigned int, unsigned int)’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:112:48: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 112 | if (!_free_memory_size) { hipMemGetInfo(&_free_memory_size, &total_size); } | ~~~^~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:4048:12: note: in call to ‘hipError_t hipMemGetInfo(size_t, size_t)’, declared here 4048 | hipError_t hipMemGetInfo(size_t free, size_t total); | ^~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:154:22: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 154 | hipMalloc(&_workspace, workSpaceSize); | ~~~~~^~~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void, size_t)’, declared here 2786 | hipError_t hipMalloc(void ptr, size_t size); | ^~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:156:20: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 156 | hipFree(_workspace); | ~^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void)’, declared here 3562 | hipError_t hipFree(void ptr); | ^~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:157:22: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 157 | hipMalloc(&_workspace, workSpaceSize); | ~~~~~^~~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void, size_t)’, declared here 2786 | hipError_t hipMalloc(void ptr, size_t size); | ^~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::release_workspace()’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:230:16: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 230 | hipFree(_workspace); | ~^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void)’, declared here 3562 | hipError_t hipFree(void* ptr); | ^~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘bool InferenceContext::retake_workspace()’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:236:18: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 236 | hipMalloc(&_workspace, _workSpaceSize); | ~~~~~^~~~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void, size_t)’, declared here 2786 | hipError_t hipMalloc(void ptr, size_t size); | ^~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::SynchComp()’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:254:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 254 | hipEventRecord(_comp_event, _comp_stream); | ~~~~~~^~~~~~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2501:12: note: in call to ‘hipError_t hipEventRecord(hipEvent_t, hipStream_t)’, declared here 2501 | hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream = NULL); | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:255:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 255 | hipStreamWaitEvent(_comm_stream, _comp_event, 0); | ~~~~^~~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2211:12: note: in call to ‘hipError_t hipStreamWaitEvent(hipStream_t, hipEvent_t, unsigned int)’, declared here 2211 | hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags); | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::SynchComm()’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:259:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 259 | hipEventRecord(_comm_event, _comm_stream); | ~~~~~~^~~~~~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2501:12: note: in call to ‘hipError_t hipEventRecord(hipEvent_t, hipStream_t)’, declared here 2501 | hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream = NULL); | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:260:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result] 260 | hipStreamWaitEvent(_comp_stream, _comm_event, 0); | ~~~~^~~~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:2211:12: note: in call to ‘hipError_t hipStreamWaitEvent(hipStream_t, hipEvent_t, unsigned int)’, declared here 2211 | hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags); | ^~~~~~ In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3, from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7: /opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here 332 | } hipError_t; | ^~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&, float) [with T = float]’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2016:5: required from here /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 542 | {hidden_dim InferenceContext::Instance().GetMaxTokenLength(), | ~~~^~~~~~~~~~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 543 | k InferenceContext::Instance().GetMaxTokenLength(), | ^~~~~~~~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 551 | {hidden_dim InferenceContext::Instance().GetMaxTokenLength(), | ~~~^~~~~~~~~~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 552 | k InferenceContext::Instance().GetMaxTokenLength(), | ^~~~~~~~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = float]’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2016:5: required from here /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 1582 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options); | ^~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&, float) [with T = half]’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2017:5: required from here /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 542 | {hidden_dim InferenceContext::Instance().GetMaxTokenLength(), | ~~~^~~~~~~~~~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 543 | k InferenceContext::Instance().GetMaxTokenLength(), | ^~~~~~~~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 551 | {hidden_dim InferenceContext::Instance().GetMaxTokenLength(), | ~~~^~~~~~~~~~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 552 | k InferenceContext::Instance().GetMaxTokenLength(), | ^~~~~~~~~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = __half]’: /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2017:5: required from here /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 1582 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options); | ^~~~~ /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing] [10/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip -o apply_rotary_pos_emb.cuda.o [11/11] c++ relu.cuda.o apply_rotary_pos_emb.cuda.o gelu.cuda.o softmax.cuda.o pt_binding_hip.o dequantize.cuda.o pointwise_ops.cuda.o layer_norm.cuda.o rms_norm.cuda.o transform.cuda.o -shared -L/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/opt/rocm/lib -lamdhip64 -o transformer_inference.so Loading extension module transformer_inference... Time to load transformer_inference op: 52.15808844566345 seconds [2024-01-03 03:01:42,879] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': True, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000} Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.0015342235565185547 seconds Traceback (most recent call last): File "/var/lib/jenkins/test.py", line 18, in string = generator("DeepSpeed is", do_sample=True, min_length=50) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/text_generation.py", line 208, in call return super().call(text_inputs, kwargs) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1140, in call return self.run_single(inputs, preprocess_params, forward_params, postprocess_params) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1147, in run_single model_outputs = self.forward(model_inputs, forward_params) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1046, in forward model_outputs = self._forward(model_inputs, forward_params) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/text_generation.py", line 271, in _forward generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, generate_kwargs) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 629, in _generate return self.module.generate(*inputs, kwargs) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 1486, in generate and torch.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0 RuntimeError: HIP error: the operation cannot be performed in the present state HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing HIP_LAUNCH_BLOCKING=1. Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

rraminen commented 10 months ago

Please try with this updated image:

rocm/deepspeed:rocm6.0_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed

sunpian1 commented 10 months ago

Hi , I tried the updated image, I still got the same error.

root@ecd8dd23e891:/var/lib/jenkins# python test.py [2024-01-05 03:06:26,705] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-05 03:06:58,949] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.12.7+83427253, git-hash=83427253, git-branch=master [2024-01-05 03:06:58,952] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2024-01-05 03:06:58,954] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/activation_type.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/activation_type.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim_hip.h [skipped, already hipified] Successfully preprocessed all matching files. Total number of unsupported CUDA function calls: 0

Total number of replaced kernel launches: 25 Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py39_cpu/transformer_inference/build.ninja... Building extension module transformer_inference... Using envvar MAX_JOBS (32) as the number of workers... [1/1] c++ pointwise_ops.cuda.o softmax.cuda.o relu.cuda.o layer_norm.cuda.o transform.cuda.o rms_norm.cuda.o dequantize.cuda.o gelu.cuda.o pt_binding_hip.o apply_rotary_pos_emb.cuda.o -shared -L/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/opt/rocm/lib -lamdhip64 -o transformer_inference.so Loading extension module transformer_inference... Time to load transformer_inference op: 1.1228477954864502 seconds [2024-01-05 03:07:00,794] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': True, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000} Traceback (most recent call last): File "/var/lib/jenkins/test.py", line 18, in string = generator("DeepSpeed is", do_sample=True, min_length=50) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/text_generation.py", line 208, in call return super().call(text_inputs, kwargs) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1140, in call return self.run_single(inputs, preprocess_params, forward_params, postprocess_params) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1147, in run_single model_outputs = self.forward(model_inputs, forward_params) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1046, in forward model_outputs = self._forward(model_inputs, forward_params) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/pipelines/text_generation.py", line 271, in _forward generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, generate_kwargs) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 636, in _generate return self.module.generate(*inputs, *kwargs) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 1583, in generate and torch.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0 RuntimeError: HIP error: the operation cannot be performed in the present state HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing HIP_LAUNCH_BLOCKING=1. Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

rraminen commented 10 months ago

Could you please post the output of rocminfo | grep gfx

and the output log of the AMD_LOG_LEVEL=3 python test.py as an attachment. Thanks.

sunpian1 commented 10 months ago

root@0b110045b111:/var/lib/jenkins# rocminfo|grep gfx Name: gfx1100 Name: amdgcn-amd-amdhsa--gfx1100

MobaXterm_10.12.70.47susie.sun_20240108_094818.txt

jithunnair-amd commented 10 months ago

root@0b110045b111:/var/lib/jenkins# rocminfo|grep gfx Name: gfx1100 Name: amdgcn-amd-amdhsa--gfx1100

MobaXterm_10.12.70.47susie.sun_20240108_094818.txt

@sunpian1 Please check your text file, it doesn't seem to be readable characters.

sunpian1 commented 10 months ago

Sorry about that. Please check this text file. MobaXterm_10.12.70.47susie.sun_20240108_094818_Unencrypted.txt

hongxiayang commented 10 months ago

import torch torch.cuda.is_available()

hongxiayang commented 10 months ago

rocminfo

sunpian1 commented 9 months ago

root@bb34d3a5c58f:/var/lib/jenkins# python3 Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:49:32) [GCC 12.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch print(torch.cuda.is_available()) True exit()

root@bb34d3a5c58f:/var/lib/jenkins# rocminfo ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES

========== HSA Agents

Agent 1

Name: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz Uuid: CPU-XX Marketing Name: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 0 BDFID: 0 Internal Node ID: 0 Compute Unit: 24 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 65834496(0x3ec8e00) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65834496(0x3ec8e00) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65834496(0x3ec8e00) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:

Agent 2

Name: gfx1100 Uuid: GPU-afa7be7439782f1c Marketing Name: Radeon RX 7900 XTX Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 6144(0x1800) KB L3: 98304(0x18000) KB Chip ID: 29772(0x744c) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2371 BDFID: 256 Internal Node ID: 1 Compute Unit: 96 SIMDs per CU: 2 Shader Engines: 6 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 550 SDMA engine uCode:: 19 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1100 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 Done root@bb34d3a5c58f:/var/lib/jenkins#

rraminen commented 9 months ago

Hi @sunpian1, could you please try with this image rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed_Inference

sunpian1 commented 9 months ago

hi, I tried the image rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed_Inference. It is ok.

sunpian1 commented 9 months ago

But when i tried the image with Llama-2 model , i got errors.

Free memory : 19.685547 (GigaBytes) Total memory: 23.984375 (GigaBytes) Requested memory: 0.312500 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0x7f5cbbe00000

Memory access fault by GPU node-1 (Agent handle: 0x564cd7ba91d0) on address 0x7f5ccfe2c000. Reason: Page not present or supervisor privilege. [2024-02-19 08:36:43,155] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 3349 [2024-02-19 08:36:43,156] [ERROR] [launch.py:322:sigkill_handler] ['/opt/conda/envs/py_3.9/bin/python', '-u', 'test.py', '--local_rank=0'] exits with return code = -6

sunpian1 commented 6 months ago

which docker image I should use to infer?

ROCm / DeepSpeed

[BUG] I have pulled the docker images,but when I run it ,I got errors. The errors suggest the images does not support AMD gpu. #68

root@bb34d3a5c58f:/var/lib/jenkins# rocminfo ROCk module is loaded

HSA System Attributes

========== HSA Agents

Free memory : 19.685547 (GigaBytes) Total memory: 23.984375 (GigaBytes) Requested memory: 0.312500 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0x7f5cbbe00000