bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

Fatal error: cuda_fp16.h: No such file or directory on ROCm #360

Open lvcc2018 opened 1 year ago

lvcc2018 commented 1 year ago

The following bugs occur when I try to run the example gpt-2 code:

[default0]:Building extension module scaled_upper_triang_masked_softmax_cuda...
[default0]:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[default0]:[1/3] c++ -MMD -MF scaled_upper_triang_masked_softmax.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/TH -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/THC -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/THH -isystem /public/software/compiler/rocm/dtk-22.10/include -isystem /public/software/compiler/rocm/dtk-22.10/miopen/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /public/home/ach2ha8oau/megatron-deepspeed/Megatron-DeepSpeed-bigscience/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -o scaled_upper_triang_masked_softmax.o 
[default0]:FAILED: scaled_upper_triang_masked_softmax.o 
[default0]:c++ -MMD -MF scaled_upper_triang_masked_softmax.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/TH -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/THC -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/THH -isystem /public/software/compiler/rocm/dtk-22.10/include -isystem /public/software/compiler/rocm/dtk-22.10/miopen/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /public/home/ach2ha8oau/megatron-deepspeed/Megatron-DeepSpeed-bigscience/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -o scaled_upper_triang_masked_softmax.o 
[default0]:/public/home/ach2ha8oau/megatron-deepspeed/Megatron-DeepSpeed-bigscience/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp:17:10: fatal error: cuda_fp16.h: No such file or directory
[default0]: #include <cuda_fp16.h>
[default0]:          ^~~~~~~~~~~~~
[default0]:compilation terminated.
[default0]:[2/3] /public/software/compiler/rocm/dtk-22.10/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/TH -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/THC -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/THH -isystem /public/software/compiler/rocm/dtk-22.10/include -isystem /public/software/compiler/rocm/dtk-22.10/miopen/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --amdgpu-target=gfx900 --amdgpu-target=gfx906 -fno-gpu-rdc -c /public/home/ach2ha8oau/megatron-deepspeed/Megatron-DeepSpeed-bigscience/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip -o scaled_upper_triang_masked_softmax_hip.cuda.o 
[default0]:FAILED: scaled_upper_triang_masked_softmax_hip.cuda.o 
[default0]:/public/software/compiler/rocm/dtk-22.10/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/TH -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/THC -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/include/THH -isystem /public/software/compiler/rocm/dtk-22.10/include -isystem /public/software/compiler/rocm/dtk-22.10/miopen/include -isystem /public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --amdgpu-target=gfx900 --amdgpu-target=gfx906 -fno-gpu-rdc -c /public/home/ach2ha8oau/megatron-deepspeed/Megatron-DeepSpeed-bigscience/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip -o scaled_upper_triang_masked_softmax_hip.cuda.o 
[default0]:clang-14: error: unsupported option '--use_fast_math'
[default0]:clang-14: error: unsupported option '--expt-relaxed-constexpr'
[default0]:clang-14: error: unsupported option '--expt-extended-lambda'
[default0]:ninja: build stopped: subcommand failed.
[default0]:Traceback (most recent call last):
[default0]:  File "/public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1718, in _run_ninja_build
[default0]:    subprocess.run(
[default0]:  File "/public/home/ach2ha8oau/miniconda3/envs/megatron-deepspeed-dtk/lib/python3.9/subprocess.py", line 528, in run
[default0]:    raise CalledProcessError(retcode, process.args,
[default0]:subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

It seems that it doesn't building extension module scaled_upper_triang_masked_softmax_cuda properly.

flyingdown commented 1 year ago

fp16 include file is hip/hip_fp16.h on ROCm