Open sunpian1 opened 11 months ago
Hi @sunpian1, could you please give some information regarding the contents in deploy.py?
I solved the issue. When i continue, I get the other error like this.
root@c8be2f1a010b:/var/lib/jenkins# python test.py [2023-12-25 06:42:04,395] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.12G/1.12G [03:08<00:00, 5.93MB/s] tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 222/222 [00:00<00:00, 24.7kB/s] tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14.5M/14.5M [00:02<00:00, 5.54MB/s] special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 85.0/85.0 [00:00<00:00, 13.0kB/s] [2023-12-25 06:45:33,972] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+4199dc25, git-hash=4199dc25, git-branch=master [2023-12-25 06:45:33,973] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2023-12-25 06:45:33,973] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py39_cpu/transformer_inference... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer_hip.h [ok] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim_hip.h [skipped, already hipified] Successfully preprocessed all matching files. Total number of unsupported CUDA function calls: 0
Total number of replaced kernel launches: 25
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py39_cpu/transformer_inference/build.ninja...
Building extension module transformer_inference...
Using envvar MAX_JOBS (32) as the number of workers...
[1/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip -o rms_norm.cuda.o
FAILED: rms_norm.cuda.o
/opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip -o rms_norm.cuda.o
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:638:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tile size is either not a power of 2 or greater than the wavefront size
static_assert(is_valid_tile_size~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:704:55: note: in instantiation of template class 'cooperative_groups::thread_block_tile_base<64>' requested here
class thread_block_tile_type<tileSize, void> : public thread_block_tile_base
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:372:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:381:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:382:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:383:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[2] = element<Op3>(data[2], warp.shfl_xor(data[2], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:392:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:393:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:394:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[2] = element<Op3>(data[2], warp.shfl_xor(data[2], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:395:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[3] = element<Op4>(data[3], warp.shfl_xor(data[3], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:431:22: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
if (warp_arg.thread_rank() == 0) {
~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:442:26: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
if (warp_arg.thread_rank() < running_warps) {
~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:446:68: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
data + i, reduce_buffer + elems * warp_arg.thread_rank() + i);
~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:456:82: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
mem_access::store_shared<bytes>(reduce_buffer + elems * warp_arg.thread_rank() + i,
~~~~~~~~ ^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:837:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tiled partition with size > wavefront size. Currently not supported
static_assert(is_valid_tile_size<size>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:98:52: note: in instantiation of function template specialization 'cooperative_groups::tiled_partition<64U, cooperative_groups::thread_block>' requested here
cg::thread_block_tile<hw_warp_size> warp = cg::tiled_partition<hw_warp_size>(tb);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:224:13: note: in instantiation of function template specialization 'pre_rms_norm<float, 1, 1, 256>' requested here
LAUNCH_ALL_RMS_NORM(1, 1, maxThreads);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:179:9: note: expanded from macro 'LAUNCH_ALL_RMS_NORM'
LAUNCH_PRE_RMS_NORM(UNROLL, threadsPerGroup, maxThreads) \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:174:25: note: expanded from macro 'LAUNCH_PRE_RMS_NORM'
hipLaunchKernelGGL(( pre_rms_norm<T, UNROLL, threadsPerGroup, maxThreads>), dim3(grid), dim3(block), 0, stream, \
^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:814:9: error: type 'impl::thread_block_tile_internal<64U, void>' is not a direct or virtual base of 'cooperative_groups::thread_block_tile<64>'
: impl::thread_block_tile_internal<size, void>(g) {}
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:801:12: note: in instantiation of function template specialization 'cooperative_groups::thread_block_tile<64>::thread_block_tile<cooperative_groups::thread_block>' requested here
return thread_block_tile<size, void>(*this);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:98:48: note: in instantiation of member function 'cooperative_groups::thread_block_tile<64, cooperative_groups::thread_block>::operator thread_block_tile' requested here
cg::thread_block_tile<hw_warp_size> warp = cg::tiled_partition<hw_warp_size>(tb);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:224:13: note: in instantiation of function template specialization 'pre_rms_norm<float, 1, 1, 256>' requested here
LAUNCH_ALL_RMS_NORM(1, 1, maxThreads);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:179:9: note: expanded from macro 'LAUNCH_ALL_RMS_NORM'
LAUNCH_PRE_RMS_NORM(UNROLL, threadsPerGroup, maxThreads) \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip:174:25: note: expanded from macro 'LAUNCH_PRE_RMS_NORM'
hipLaunchKernelGGL(( pre_rms_norm<T, UNROLL, threadsPerGroup, maxThreads>), dim3(grid), dim3(block), 0, stream, \
^
16 errors generated when compiling for gfx1030.
[2/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip -o apply_rotary_pos_emb.cuda.o
FAILED: apply_rotary_pos_emb.cuda.o
/opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip -o apply_rotary_pos_emb.cuda.o
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:638:3: **error**: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tile size is either not a power of 2 or greater than the wavefront size
static_assert(is_valid_tile_size<size>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:691:39: note: in instantiation of template class 'cooperative_groups::thread_block_tile_base<64>' requested here
class thread_block_tile_type : public thread_block_tile_base<tileSize>,
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:781:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile_type<64, cooperative_groups::thread_block>' requested here
class thread_block_tile_internal : public thread_block_tile_type<size, ParentCGTy> {
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:794:34: note: in instantiation of template class 'cooperative_groups::impl::thread_block_tile_internal<64, cooperative_groups::thread_block>' requested here
class thread_block_tile : public impl::thread_block_tile_internal<size, ParentCGTy> {
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: note: in instantiation of template class 'cooperative_groups::thread_block_tile<64, cooperative_groups::thread_block>' requested here
cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:174:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 4>' requested here
LAUNCH_FOR_ALIGNMENT(4);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:837:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tiled partition with size > wavefront size. Currently not supported
static_assert(is_valid_tile_size<size>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:60: note: in instantiation of function template specialization 'cooperative_groups::tiled_partition<64U, cooperative_groups::thread_block>' requested here
cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:174:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 4>' requested here
LAUNCH_FOR_ALIGNMENT(4);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: error: no matching function for call to 'tiled_partition'
cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:176:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 8>' requested here
LAUNCH_FOR_ALIGNMENT(8);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:836:54: note: candidate template ignored: substitution failure [with size = 64, ParentCGTy = cg::thread_block]
__CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:723:18: error: no member named 'sync' in 'cooperative_groups::thread_block_tile_base<64>'
using tbtBase::sync;
~~~~~~~~~^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:781:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile_type<64, void>' requested here
class thread_block_tile_internal : public thread_block_tile_type<size, ParentCGTy> {
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:807:46: note: in instantiation of template class 'cooperative_groups::impl::thread_block_tile_internal<64, void>' requested here
class thread_block_tile<size, void> : public impl::thread_block_tile_internal<size, void> {
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile<64>' requested here
cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:176:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 8>' requested here
LAUNCH_FOR_ALIGNMENT(8);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: error: no matching function for call to 'tiled_partition'
cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:178:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<float, 64, 16>' requested here
LAUNCH_FOR_ALIGNMENT(16);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:836:54: note: candidate template ignored: substitution failure [with size = 64, ParentCGTy = cg::thread_block]
__CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:56:48: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
const int base_neuron_idx = head_group.thread_rank() * T_per_thread;
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:74:58: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
? head_group.thread_rank() + half_dim_threads
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:77:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
const float q_rot_temp = head_group.shfl(q_rot, target_lane);
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:78:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
const float k_rot_temp = head_group.shfl(k_rot, target_lane);
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: error: no matching function for call to 'tiled_partition'
cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:174:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<__half, 64, 4>' requested here
LAUNCH_FOR_ALIGNMENT(4);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:836:54: note: candidate template ignored: substitution failure [with size = 64, ParentCGTy = cg::thread_block]
__CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:56:48: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
const int base_neuron_idx = head_group.thread_rank() * T_per_thread;
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:74:58: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
? head_group.thread_rank() + half_dim_threads
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:77:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
const float q_rot_temp = head_group.shfl(q_rot, target_lane);
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:78:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
const float k_rot_temp = head_group.shfl(k_rot, target_lane);
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:44:56: error: no matching function for call to 'tiled_partition'
cg::thread_block_tile<threadsPerHead> head_group = cg::tiled_partition<threadsPerHead>(tb);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:176:9: note: in instantiation of function template specialization 'apply_rotary_pos_half<__half, 64, 8>' requested here
LAUNCH_FOR_ALIGNMENT(8);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:115:9: note: expanded from macro 'LAUNCH_FOR_ALIGNMENT'
LAUNCH_ROT_POS_EMB_HALF(64, ALIGNMENT); \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:93:25: note: expanded from macro 'LAUNCH_ROT_POS_EMB_HALF'
hipLaunchKernelGGL(( apply_rotary_pos_half<T, HEAD_THREADS, ALIGNMENT>), dim3(grid), dim3(block), 0, stream, mixed_query, \
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:836:54: note: candidate template ignored: substitution failure [with size = 64, ParentCGTy = cg::thread_block]
__CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:56:48: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
const int base_neuron_idx = head_group.thread_rank() * T_per_thread;
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:74:58: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
? head_group.thread_rank() + half_dim_threads
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:77:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
const float q_rot_temp = head_group.shfl(q_rot, target_lane);
~~~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:78:53: error: no member named 'shfl' in 'cooperative_groups::thread_block_tile<64>'
const float k_rot_temp = head_group.shfl(k_rot, target_lane);
~~~~~~~~~~ ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx1030.
[3/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip -o layer_norm.cuda.o
FAILED: layer_norm.cuda.o
/opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip -o layer_norm.cuda.o
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:638:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tile size is either not a power of 2 or greater than the wavefront size
static_assert(is_valid_tile_size<size>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:704:55: note: in instantiation of template class 'cooperative_groups::thread_block_tile_base<64>' requested here
class thread_block_tile_type<tileSize, void> : public thread_block_tile_base<tileSize>,
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:781:43: note: in instantiation of template class 'cooperative_groups::thread_block_tile_type<64, void>' requested here
class thread_block_tile_internal : public thread_block_tile_type<size, ParentCGTy> {
^
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:807:46: note: in instantiation of template class 'cooperative_groups::impl::thread_block_tile_internal<64, void>' requested here
class thread_block_tile<size, void> : public impl::thread_block_tile_internal<size, void> {
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:362:44: note: in instantiation of template class 'cooperative_groups::thread_block_tile<64>' requested here
data[0] = element<Op>(data[0], warp.shfl_xor(data[0], i));
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:371:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:372:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:381:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:382:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:383:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[2] = element<Op3>(data[2], warp.shfl_xor(data[2], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:392:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[0] = element<Op1>(data[0], warp.shfl_xor(data[0], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:393:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[1] = element<Op2>(data[1], warp.shfl_xor(data[1], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:394:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[2] = element<Op3>(data[2], warp.shfl_xor(data[2], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:395:46: error: no member named 'shfl_xor' in 'cooperative_groups::thread_block_tile<64>'
data[3] = element<Op4>(data[3], warp.shfl_xor(data[3], i));
~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:431:22: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
if (warp_arg.thread_rank() == 0) {
~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:442:26: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
if (warp_arg.thread_rank() < running_warps) {
~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:446:68: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
data + i, reduce_buffer + elems * warp_arg.thread_rank() + i);
~~~~~~~~ ^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:456:82: error: no member named 'thread_rank' in 'cooperative_groups::thread_block_tile<64>'
mem_access::store_shared<bytes>(reduce_buffer + elems * warp_arg.thread_rank() + i,
~~~~~~~~ ^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:837:3: error: static assertion failed due to requirement 'integral_constant<bool, false>::value': Tiled partition with size > wavefront size. Currently not supported
static_assert(is_valid_tile_size<size>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:239:52: note: in instantiation of function template specialization 'cooperative_groups::tiled_partition<64U, cooperative_groups::thread_block>' requested here
cg::thread_block_tile<hw_warp_size> warp = cg::tiled_partition<hw_warp_size>(tb);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:387:13: note: in instantiation of function template specialization 'fused_residual_ln<__half, 1, 1, 256, false>' requested here
LAUNCH_FUSED_RES_LN(1, 1, maxThreads);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:341:25: note: expanded from macro 'LAUNCH_FUSED_RES_LN'
hipLaunchKernelGGL(( fused_residual_ln<T, unRollFactor, threadsPerGroup, maxThreads, false>) \
^
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:8:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:9:
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h:24:
In file included from /opt/rocm-5.7.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:814:9: error: type 'impl::thread_block_tile_internal<64U, void>' is not a direct or virtual base of 'cooperative_groups::thread_block_tile<64>'
: impl::thread_block_tile_internal<size, void>(g) {}
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:801:12: note: in instantiation of function template specialization 'cooperative_groups::thread_block_tile<64>::thread_block_tile<cooperative_groups::thread_block>' requested here
return thread_block_tile<size, void>(*this);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:239:48: note: in instantiation of member function 'cooperative_groups::thread_block_tile<64, cooperative_groups::thread_block>::operator thread_block_tile' requested here
cg::thread_block_tile<hw_warp_size> warp = cg::tiled_partition<hw_warp_size>(tb);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:387:13: note: in instantiation of function template specialization 'fused_residual_ln<__half, 1, 1, 256, false>' requested here
LAUNCH_FUSED_RES_LN(1, 1, maxThreads);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip:341:25: note: expanded from macro 'LAUNCH_FUSED_RES_LN'
hipLaunchKernelGGL(( fused_residual_ln<T, unRollFactor, threadsPerGroup, maxThreads, false>) \
^
16 errors generated when compiling for gfx1030.
[4/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip -o dequantize.cuda.o
[5/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip -o relu.cuda.o
[6/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip -o pointwise_ops.cuda.o
[7/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip -o transform.cuda.o
[8/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip -o gelu.cuda.o
[9/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip -o softmax.cuda.o
[10/11] c++ -MMD -MF pt_binding_hip.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -D__HIP_PLATFORM_AMD__=1 -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp -o pt_binding_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In constructor ‘InferenceContext::InferenceContext()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:77:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
77 | hipEventCreate(&_comp1_event);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t**)’, declared here
2472 | hipError_t hipEventCreate(hipEvent_t* event);
| ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:78:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
78 | hipEventCreate(&_comp2_event);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t**)’, declared here
2472 | hipError_t hipEventCreate(hipEvent_t* event);
| ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:79:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
79 | hipEventCreate(&_comp_event);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t**)’, declared here
2472 | hipError_t hipEventCreate(hipEvent_t* event);
| ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:80:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
80 | hipEventCreate(&_comm_event);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t**)’, declared here
2472 | hipError_t hipEventCreate(hipEvent_t* event);
| ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In destructor ‘virtual InferenceContext::~InferenceContext()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:86:16: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
86 | hipFree(_workspace);
| ~~~~~~~^~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void*)’, declared here
3562 | hipError_t hipFree(void* ptr);
| ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:87:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
87 | hipEventDestroy(_comp1_event);
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
2521 | hipError_t hipEventDestroy(hipEvent_t event);
| ^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:88:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
88 | hipEventDestroy(_comp2_event);
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
2521 | hipError_t hipEventDestroy(hipEvent_t event);
| ^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:89:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
89 | hipEventDestroy(_comp_event);
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
2521 | hipError_t hipEventDestroy(hipEvent_t event);
| ^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:90:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
90 | hipEventDestroy(_comm_event);
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
2521 | hipError_t hipEventDestroy(hipEvent_t event);
| ^~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::GenWorkSpace(const unsigned int&, const unsigned int&, const size_t&, const size_t&, const size_t&, const unsigned int&, const bool&, const size_t&, const unsigned int&, unsigned int, unsigned int)’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:112:48: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
112 | if (!_free_memory_size) { hipMemGetInfo(&_free_memory_size, &total_size); }
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:4048:12: note: in call to ‘hipError_t hipMemGetInfo(size_t*, size_t*)’, declared here
4048 | hipError_t hipMemGetInfo(size_t* free, size_t* total);
| ^~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:154:22: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
154 | hipMalloc(&_workspace, workSpaceSize);
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void**, size_t)’, declared here
2786 | hipError_t hipMalloc(void** ptr, size_t size);
| ^~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:156:20: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
156 | hipFree(_workspace);
| ~~~~~~~^~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void*)’, declared here
3562 | hipError_t hipFree(void* ptr);
| ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:157:22: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
157 | hipMalloc(&_workspace, workSpaceSize);
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void**, size_t)’, declared here
2786 | hipError_t hipMalloc(void** ptr, size_t size);
| ^~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::release_workspace()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:230:16: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
230 | hipFree(_workspace);
| ~~~~~~~^~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void*)’, declared here
3562 | hipError_t hipFree(void* ptr);
| ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘bool InferenceContext::retake_workspace()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:236:18: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
236 | hipMalloc(&_workspace, _workSpaceSize);
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void**, size_t)’, declared here
2786 | hipError_t hipMalloc(void** ptr, size_t size);
| ^~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::SynchComp()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:254:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
254 | hipEventRecord(_comp_event, _comp_stream);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2501:12: note: in call to ‘hipError_t hipEventRecord(hipEvent_t, hipStream_t)’, declared here
2501 | hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream = NULL);
| ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:255:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
255 | hipStreamWaitEvent(_comm_stream, _comp_event, 0);
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2211:12: note: in call to ‘hipError_t hipStreamWaitEvent(hipStream_t, hipEvent_t, unsigned int)’, declared here
2211 | hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags);
| ^~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::SynchComm()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:259:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
259 | hipEventRecord(_comm_event, _comm_stream);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2501:12: note: in call to ‘hipError_t hipEventRecord(hipEvent_t, hipStream_t)’, declared here
2501 | hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream = NULL);
| ^~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:260:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
260 | hipStreamWaitEvent(_comp_stream, _comm_event, 0);
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2211:12: note: in call to ‘hipError_t hipStreamWaitEvent(hipStream_t, hipEvent_t, unsigned int)’, declared here
2211 | hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags);
| ^~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector<at::Tensor> ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&, float) [with T = float]’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2016:5: required from here
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
542 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
543 | k * InferenceContext::Instance().GetMaxTokenLength(),
| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
551 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
552 | k * InferenceContext::Instance().GetMaxTokenLength(),
| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = float]’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2016:5: required from here
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
1582 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
| ^~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector<at::Tensor> ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&, float) [with T = __half]’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2017:5: required from here
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
542 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
543 | k * InferenceContext::Instance().GetMaxTokenLength(),
| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
551 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
552 | k * InferenceContext::Instance().GetMaxTokenLength(),
| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = __half]’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:2017:5: required from here
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
1582 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
| ^~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1888, in _run_ninja_build
subprocess.run(
File "/opt/conda/envs/py_3.9/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '32']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/jenkins/test.py", line 14, in <module>
generator.model = deepspeed.init_inference(generator.model,
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/__init__.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 160, in __init__
self._apply_injection_policy(config)
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 411, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 339, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 583, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 643, in _replace_module
_, layer_id = _replace_module(child,
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 643, in _replace_module
_, layer_id = _replace_module(child,
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 619, in _replace_module
replaced_module = policies[child.__class__][0](child,
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 298, in replace_fn
new_module = replace_with_policy(child,
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 247, in replace_with_policy
_container.create_module()
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/module_inject/containers/bloom.py", line 30, in create_module
self.module = DeepSpeedBloomInference(_config, mp_group=self.mp_group)
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_bloom.py", line 20, in __init__
super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping)
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in __init__
inference_module = builder.load()
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
return self.jit_load(verbose)
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 492, in jit_load
op_module = load(name=self.name,
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1279, in load
return _jit_compile(
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1504, in _jit_compile
_write_ninja_file_and_build_library(
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1619, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1904, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'transformer_inference'
root@c8be2f1a010b:/var/lib/jenkins#
root@3c5db63db5ac:/var/lib/jenkins# cat test.py
import os
import deepspeed
import torch
from transformers import pipeline
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='bigscience/bloom-560m',
device=local_rank)
generator.model = deepspeed.init_inference(generator.model,
mp_size=world_size,
dtype=torch.float,
replace_with_kernel_inject=True)
string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string)
transformer_inference extension is not fully enabled on AMD GPU yet. But we have a workaround for the error you are running into.
vi /opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_cooperative_groups.h and comment the below two asserts to proceed.
637 template <unsigned int size> class thread_block_tile_base : public tile_base<size> {
638 //static_assert(is_valid_tile_size<size>::value,
639 // "Tile size is either not a power of 2 or greater than the wavefront size");
640 using tile_base<size>::numThreads;
836 __CG_QUALIFIER__ thread_block_tile<size, ParentCGTy> tiled_partition(const ParentCGTy& g) {
837 //static_assert(is_valid_tile_size<size>::value,
838 // "Tiled partition with size > wavefront size. Currently not supported ");
839 return impl::tiled_partition_internal<size, ParentCGTy>(g);
840 }
hi,@rraminen, After i comment the two asserts, I proceed. But I encounter the other new errors .
root@5bacb2f1ed69:/var/lib/jenkins# python test.py [2024-01-03 02:54:44,962] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 693/693 [00:00<00:00, 74.2kB/s] model.safetensors: 82%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 920M/1.12G [02:33<00:33, 5.99MB/s] model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.12G/1.12G [03:08<00:00, 5.93MB/s] tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 222/222 [00:00<00:00, 37.4kB/s] tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14.5M/14.5M [00:02<00:00, 5.26MB/s] special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 85.0/85.0 [00:00<00:00, 12.9kB/s] [2024-01-03 03:00:50,149] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+4199dc25, git-hash=4199dc25, git-branch=master [2024-01-03 03:00:50,151] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2024-01-03 03:00:50,151] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py39_cpu/transformer_inference... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer_hip.h [ok] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim_hip.h [skipped, already hipified] Successfully preprocessed all matching files. Total number of unsupported CUDA function calls: 0
Total number of replaced kernel launches: 25
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py39_cpu/transformer_inference/build.ninja...
Building extension module transformer_inference...
Using envvar MAX_JOBS (32) as the number of workers...
[1/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip -o relu.cuda.o
[2/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip -o dequantize.cuda.o
[3/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip -o pointwise_ops.cuda.o
[4/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip -o gelu.cuda.o
[5/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip -o transform.cuda.o
[6/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip -o rms_norm.cuda.o
[7/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip -o softmax.cuda.o
[8/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1 -O3 -std=c++17 -UHIP_NO_HALF_OPERATORS -UHIP_NO_HALF_CONVERSIONS -UHIP_NO_HALF2_OPERATORS -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=7 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 -fno-gpu-rdc -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip -o layer_norm.cuda.o
[9/11] c++ -MMD -MF pt_binding_hip.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DHIP_PLATFORM_AMD=1 -c /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp -o pt_binding_hip.o -fPIC -DHIP_PLATFORM_HCC=1 -DUSE_ROCM=1
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In constructor ‘InferenceContext::InferenceContext()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:77:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
77 | hipEventCreate(&_comp1_event);
| ~~~~^~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t)’, declared here
2472 | hipError_t hipEventCreate(hipEvent_t* event);
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:78:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
78 | hipEventCreate(&_comp2_event);
| ~~~~^~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t*)’, declared here
2472 | hipError_t hipEventCreate(hipEvent_t event);
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:79:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
79 | hipEventCreate(&_comp_event);
| ~~~~^~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t)’, declared here
2472 | hipError_t hipEventCreate(hipEvent_t* event);
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:80:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
80 | hipEventCreate(&_comm_event);
| ~~~~^~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2472:12: note: in call to ‘hipError_t hipEventCreate(ihipEvent_t)’, declared here
2472 | hipError_t hipEventCreate(hipEvent_t event);
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In destructor ‘virtual InferenceContext::~InferenceContext()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:86:16: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
86 | hipFree(_workspace);
| ~^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void)’, declared here
3562 | hipError_t hipFree(void ptr);
| ^~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:87:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
87 | hipEventDestroy(_comp1_event);
| ~~~^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
2521 | hipError_t hipEventDestroy(hipEvent_t event);
| ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:88:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
88 | hipEventDestroy(_comp2_event);
| ~~~^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
2521 | hipError_t hipEventDestroy(hipEvent_t event);
| ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:89:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
89 | hipEventDestroy(_comp_event);
| ~~~^~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
2521 | hipError_t hipEventDestroy(hipEvent_t event);
| ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:90:24: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
90 | hipEventDestroy(_comm_event);
| ~~~^~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2521:12: note: in call to ‘hipError_t hipEventDestroy(hipEvent_t)’, declared here
2521 | hipError_t hipEventDestroy(hipEvent_t event);
| ^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::GenWorkSpace(const unsigned int&, const unsigned int&, const size_t&, const size_t&, const size_t&, const unsigned int&, const bool&, const size_t&, const unsigned int&, unsigned int, unsigned int)’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:112:48: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
112 | if (!_free_memory_size) { hipMemGetInfo(&_free_memory_size, &total_size); }
| ~~~^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:4048:12: note: in call to ‘hipError_t hipMemGetInfo(size_t, size_t)’, declared here
4048 | hipError_t hipMemGetInfo(size_t free, size_t total);
| ^~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:154:22: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
154 | hipMalloc(&_workspace, workSpaceSize);
| ~~~^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void, size_t)’, declared here
2786 | hipError_t hipMalloc(void ptr, size_t size);
| ^~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:156:20: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
156 | hipFree(_workspace);
| ~^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void)’, declared here
3562 | hipError_t hipFree(void ptr);
| ^~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:157:22: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
157 | hipMalloc(&_workspace, workSpaceSize);
| ~~~^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void, size_t)’, declared here
2786 | hipError_t hipMalloc(void ptr, size_t size);
| ^~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::release_workspace()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:230:16: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
230 | hipFree(_workspace);
| ~^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:3562:12: note: in call to ‘hipError_t hipFree(void)’, declared here
3562 | hipError_t hipFree(void* ptr);
| ^~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘bool InferenceContext::retake_workspace()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:236:18: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
236 | hipMalloc(&_workspace, _workSpaceSize);
| ~~~^~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2786:12: note: in call to ‘hipError_t hipMalloc(void, size_t)’, declared here
2786 | hipError_t hipMalloc(void ptr, size_t size);
| ^~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::SynchComp()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:254:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
254 | hipEventRecord(_comp_event, _comp_stream);
| ~~~~^~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2501:12: note: in call to ‘hipError_t hipEventRecord(hipEvent_t, hipStream_t)’, declared here
2501 | hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream = NULL);
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:255:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
255 | hipStreamWaitEvent(_comm_stream, _comp_event, 0);
| ~~~~^~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2211:12: note: in call to ‘hipError_t hipStreamWaitEvent(hipStream_t, hipEvent_t, unsigned int)’, declared here
2211 | hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags);
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h: In member function ‘void InferenceContext::SynchComm()’:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:259:23: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
259 | hipEventRecord(_comm_event, _comm_stream);
| ~~~~^~~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2501:12: note: in call to ‘hipError_t hipEventRecord(hipEvent_t, hipStream_t)’, declared here
2501 | hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream = NULL);
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:11:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h:260:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute nodiscard [-Wunused-result]
260 | hipStreamWaitEvent(_comp_stream, _comm_event, 0);
| ~~~~^~~~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:2211:12: note: in call to ‘hipError_t hipStreamWaitEvent(hipStream_t, hipEvent_t, unsigned int)’, declared here
2211 | hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags);
| ^~~~~~
In file included from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/c10/hip/HIPStream.h:7,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:3,
from /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:7:
/opt/rocm/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector~~~^~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
543 | k InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
551 | {hidden_dim InferenceContext::Instance().GetMaxTokenLength(),
| ~~~^~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
552 | k InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector~~~^~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:542:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
543 | k InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:543:41: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
551 | {hidden_dim InferenceContext::Instance().GetMaxTokenLength(),
| ~~~^~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:551:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
552 | k InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:552:29: warning: narrowing conversion of ‘(((size_t)k) (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp: In instantiation of ‘std::vector~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp:1582:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
[10/11] /opt/rocm/bin/hipcc -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THC -isystem /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/conda/envs/py_3.9/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DTORCH_USE_HIP_DSA
to enable device-side assertions.
Please try with this updated image:
rocm/deepspeed:rocm6.0_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed
Hi , I tried the updated image, I still got the same error.
root@ecd8dd23e891:/var/lib/jenkins# python test.py [2024-01-05 03:06:26,705] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-05 03:06:58,949] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.12.7+83427253, git-hash=83427253, git-branch=master [2024-01-05 03:06:58,952] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2024-01-05 03:06:58,954] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding_hip.cpp [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.hip [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/transformer/inference/includes/inference_context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_transformer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dequantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantizer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/dropout_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/simd.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/custom_hip_layers.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/softmax_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/StopWatch.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/context_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/general_kernels_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/compat.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gemm_test_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_lion_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/Timer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cublas_wrappers_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/strided_batch_gemm_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/ds_kernel_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/feed_forward_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/normalize_layer_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adagrad_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/activation_type.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/activation_type.h [skipped, no changes] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/cpu_adam_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/gelu_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/memory_access_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/quantization_utils_hip.h [skipped, already hipified] /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim.h -> /opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/ops/csrc/includes/type_shim_hip.h [skipped, already hipified] Successfully preprocessed all matching files. Total number of unsupported CUDA function calls: 0
Total number of replaced kernel launches: 25
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py39_cpu/transformer_inference/build.ninja...
Building extension module transformer_inference...
Using envvar MAX_JOBS (32) as the number of workers...
[1/1] c++ pointwise_ops.cuda.o softmax.cuda.o relu.cuda.o layer_norm.cuda.o transform.cuda.o rms_norm.cuda.o dequantize.cuda.o gelu.cuda.o pt_binding_hip.o apply_rotary_pos_emb.cuda.o -shared -L/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/opt/rocm/lib -lamdhip64 -o transformer_inference.so
Loading extension module transformer_inference...
Time to load transformer_inference op: 1.1228477954864502 seconds
[2024-01-05 03:07:00,794] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': True, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000}
Traceback (most recent call last):
File "/var/lib/jenkins/test.py", line 18, in TORCH_USE_HIP_DSA
to enable device-side assertions.
Could you please post the output of rocminfo | grep gfx
and the output log of the AMD_LOG_LEVEL=3 python test.py
as an attachment. Thanks.
root@0b110045b111:/var/lib/jenkins# rocminfo|grep gfx Name: gfx1100 Name: amdgcn-amd-amdhsa--gfx1100
root@0b110045b111:/var/lib/jenkins# rocminfo|grep gfx Name: gfx1100 Name: amdgcn-amd-amdhsa--gfx1100
@sunpian1 Please check your text file, it doesn't seem to be readable characters.
Sorry about that. Please check this text file. MobaXterm_10.12.70.47susie.sun_20240108_094818_Unencrypted.txt
import torch torch.cuda.is_available()
rocminfo
root@bb34d3a5c58f:/var/lib/jenkins# python3 Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:49:32) [GCC 12.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.
import torch print(torch.cuda.is_available()) True exit()
Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES
Agent 1
Name: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz Uuid: CPU-XX Marketing Name: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 0 BDFID: 0 Internal Node ID: 0 Compute Unit: 24 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 65834496(0x3ec8e00) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65834496(0x3ec8e00) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65834496(0x3ec8e00) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:
Agent 2
Name: gfx1100 Uuid: GPU-afa7be7439782f1c Marketing Name: Radeon RX 7900 XTX Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 6144(0x1800) KB L3: 98304(0x18000) KB Chip ID: 29772(0x744c) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2371 BDFID: 256 Internal Node ID: 1 Compute Unit: 96 SIMDs per CU: 2 Shader Engines: 6 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 550 SDMA engine uCode:: 19 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1100 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 Done root@bb34d3a5c58f:/var/lib/jenkins#
Hi @sunpian1, could you please try with this image rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed_Inference
hi, I tried the image rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed_Inference. It is ok.
But when i tried the image with Llama-2 model , i got errors.
Memory access fault by GPU node-1 (Agent handle: 0x564cd7ba91d0) on address 0x7f5ccfe2c000. Reason: Page not present or supervisor privilege. [2024-02-19 08:36:43,155] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 3349 [2024-02-19 08:36:43,156] [ERROR] [launch.py:322:sigkill_handler] ['/opt/conda/envs/py_3.9/bin/python', '-u', 'test.py', '--local_rank=0'] exits with return code = -6
which docker image I should use to infer?
susie.sun@yz-amd1:~$ docker run -it rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed /bin/bash root@c50e90963e1a:/var/lib/jenkins# deepspeed --num_gpus 1 deploy.py [2023-12-14 01:52:04,385] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-14 01:52:05,180] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. Traceback (most recent call last): File "/opt/conda/envs/py_3.9/bin/deepspeed", line 6, in
main()
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 422, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available
our AMD gpu is AMD Radeon™ RX 7900 XTX