Closed AnyaCoder closed 1 month ago
OK, I made some modifications that it seems to work.
But I faced another issue caused by the upstream repo cudnn-frontend
.
Maybe I can ask the upstream repository to add the compatibility library dlfcn-win32
link
[9/33] Building CUDA object CMakeFiles\transformer_engine.dir\fused_attn\fused_attn_f16_max512_seqlen.cu.obj
FAILED: CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.obj
C:\Miniconda3\envs\build\bin\nvcc.exe -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\.. -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\include -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\..\..\3rdparty\cudnn-frontend\include -ID:\a\TransformerEngine\TransformerEngine\build\cmake\string_headers -isystem C:\Miniconda3\envs\build\include -D_WINDOWS -Xcompiler=" /EHsc" --expt-relaxed-constexpr -O3 -Xcompiler="-O2 -Ob2" -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-MD -MD -MT CMakeFiles\transformer_engine.dir\fused_attn\fused_attn_f16_max512_seqlen.cu.obj -MF CMakeFiles\transformer_engine.dir\fused_attn\fused_attn_f16_max512_seqlen.cu.obj.d -x cu -c D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\fused_attn\fused_attn_f16_max512_seqlen.cu -o CMakeFiles\transformer_engine.dir\fused_attn\fused_attn_f16_max512_seqlen.cu.obj -Xcompiler=-FdCMakeFiles\transformer_engine.dir\,-FS
D:\a\TransformerEngine\TransformerEngine\3rdparty\cudnn-frontend\include\cudnn_frontend_shim.h(28): fatal error C1083: Cannot open include file: 'dlfcn.h': No such file or directory
fused_attn_f16_max512_seqlen.cu
ninja: build stopped: subcommand failed.
Building CMake extension transformer_engine
Running command C:\Program Files\CMake\bin\cmake.exe -S D:\a\TransformerEngine\TransformerEngine\transformer_engine\common -B D:\a\TransformerEngine\TransformerEngine\build\cmake -DPython_EXECUTABLE=C:\Miniconda3\envs\build\python.exe -DPython_INCLUDE_DIR=C:\Miniconda3\envs\build\Include -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=D:\a\TransformerEngine\TransformerEngine\build\lib.win-amd64-cpython-310 -Dpybind11_DIR=D:\a\TransformerEngine\TransformerEngine\.eggs\pybind11-2.13.1-py3.10.egg\pybind11\share\cmake\pybind11 -GNinja
Running command C:\Program Files\CMake\bin\cmake.exe --build D:\a\TransformerEngine\TransformerEngine\build\cmake --parallel 1
After struggling, I was stucked at the functions in nvcc
compiler
https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__CAST.html
Could anyone help?
2024-08-04T13:57:38.0152640Z running build_ext
2024-08-04T13:57:51.1267194Z -- The CUDA compiler identification is NVIDIA 12.4.99
2024-08-04T13:57:51.5210896Z -- The CXX compiler identification is MSVC 19.39.33523.0
2024-08-04T13:57:51.6432579Z -- Detecting CUDA compiler ABI info
2024-08-04T13:57:56.4257080Z -- Detecting CUDA compiler ABI info - done
2024-08-04T13:57:56.4607149Z -- Check for working CUDA compiler: C:/Miniconda3/envs/build/bin/nvcc.exe - skipped
2024-08-04T13:57:57.3469432Z -- Detecting CUDA compile features
2024-08-04T13:57:57.3475842Z -- Detecting CUDA compile features - done
2024-08-04T13:57:57.3737106Z -- Detecting CXX compiler ABI info
2024-08-04T13:57:57.9295720Z -- Detecting CXX compiler ABI info - done
2024-08-04T13:57:57.9419836Z -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
2024-08-04T13:57:57.9423682Z -- Detecting CXX compile features
2024-08-04T13:57:57.9439766Z -- Detecting CXX compile features - done
2024-08-04T13:57:58.0638285Z -- Found CUDAToolkit: C:/Miniconda3/envs/build/include (found version "12.4.99")
2024-08-04T13:57:59.3327426Z -- cudnn found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn.lib.
2024-08-04T13:57:59.3337712Z -- Found LIBRARY: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/include
2024-08-04T13:57:59.3338849Z -- cuDNN: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn.lib
2024-08-04T13:57:59.3339612Z -- cuDNN: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/include
2024-08-04T13:57:59.3346771Z -- cudnn_adv_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_adv_infer.lib.
2024-08-04T13:57:59.3352302Z -- cudnn_adv_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_adv_train.lib.
2024-08-04T13:57:59.3358841Z -- cudnn_cnn_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_cnn_infer.lib.
2024-08-04T13:57:59.3364112Z -- cudnn_cnn_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_cnn_train.lib.
2024-08-04T13:57:59.3371710Z -- cudnn_ops_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_ops_infer.lib.
2024-08-04T13:57:59.3376887Z -- cudnn_ops_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_ops_train.lib.
2024-08-04T13:58:00.1166419Z -- Found Python: C:\Miniconda3\envs\build\python.exe (found version "3.10.14") found components: Interpreter Development.Module
2024-08-04T13:58:00.1256975Z -- Configuring done (22.1s)
2024-08-04T13:58:00.2562336Z CMake Warning:
2024-08-04T13:58:00.2563016Z -- Generating done (0.1s)
2024-08-04T13:58:00.2563602Z Manually-specified variables were not used by the project:
2024-08-04T13:58:00.2563937Z
2024-08-04T13:58:00.2564020Z pybind11_DIR
2024-08-04T13:58:00.2564176Z
2024-08-04T13:58:00.2564181Z
2024-08-04T13:58:00.2573429Z -- Build files have been written to: D:/a/TransformerEngine/TransformerEngine/build/cmake
2024-08-04T13:58:10.0792975Z [1/33] Building CXX object CMakeFiles\transformer_engine.dir\pycudnn.cpp.obj
2024-08-04T13:58:10.0794795Z [2/33] Building CXX object CMakeFiles\transformer_engine.dir\transformer_engine.cpp.obj
2024-08-04T13:58:10.0797032Z D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(103):
...
2024-08-04T14:36:17.5863062Z
2024-08-04T14:36:17.5863550Z tmpxft_0000077c_00000000-10_rmsnorm_fwd_cuda_kernel.compute_90.cudafe1.cpp
2024-08-04T14:36:17.5863916Z
2024-08-04T14:36:17.5864180Z [23/33] Building CUDA object CMakeFiles\transformer_engine.dir\util\cast.cu.obj
2024-08-04T14:36:17.5864767Z FAILED: CMakeFiles/transformer_engine.dir/util/cast.cu.obj
2024-08-04T14:36:17.5873209Z C:\Miniconda3\envs\build\bin\nvcc.exe -forward-unknown-to-host-compiler -DNOMINMAX -Dtransformer_engine_EXPORTS -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\.. -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\include -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\..\..\3rdparty\cudnn-frontend\include -ID:\a\TransformerEngine\TransformerEngine\build\cmake\string_headers -isystem C:\Miniconda3\envs\build\include -D_WINDOWS -Xcompiler=" /EHsc" --expt-relaxed-constexpr -O3 -Xcompiler="-O2 -Ob2" -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-MD -MD -MT CMakeFiles\transformer_engine.dir\util\cast.cu.obj -MF CMakeFiles\transformer_engine.dir\util\cast.cu.obj.d -x cu -c D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\util\cast.cu -o CMakeFiles\transformer_engine.dir\util\cast.cu.obj -Xcompiler=-FdCMakeFiles\transformer_engine.dir\,-FS
2024-08-04T14:36:17.5882254Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(2944): error: identifier "__fmaf_rn" is undefined
2024-08-04T14:36:17.5883343Z const float ar = __fmaf_rn(a, 0.636619772F, 12582912.0F);
2024-08-04T14:36:17.5883947Z ^
2024-08-04T14:36:17.5884344Z
2024-08-04T14:36:17.5884931Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(2945): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.5885893Z const unsigned q = __float_as_uint(ar);
2024-08-04T14:36:17.5886361Z ^
2024-08-04T14:36:17.5886598Z
2024-08-04T14:36:17.5887127Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(2946): error: identifier "__fsub_rn" is undefined
2024-08-04T14:36:17.5888033Z const float j = __fsub_rn(ar, 12582912.0F);
2024-08-04T14:36:17.5888608Z ^
2024-08-04T14:36:17.5888831Z
2024-08-04T14:36:17.5889371Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(2982): error: identifier "__fmaf_rn" is undefined
2024-08-04T14:36:17.5890230Z z = __fmaf_rn(a8, x2, a6);
2024-08-04T14:36:17.5890621Z ^
2024-08-04T14:36:17.5890792Z
2024-08-04T14:36:17.5891318Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(3471): error: identifier "atomicCAS" is undefined
2024-08-04T14:36:17.5893861Z { unsigned int* address_as_uint = (unsigned int*)address; unsigned int old = *address_as_uint; unsigned int assumed; do { assumed = old; __half2 new_val = __hadd2(val, *(__half2*)&assumed); old = atomicCAS(address_as_uint, assumed, *(unsigned int*)&new_val); } while (assumed != old); return *(__half2*)&old; }
2024-08-04T14:36:17.5896281Z ^
2024-08-04T14:36:17.5896806Z
2024-08-04T14:36:17.5897506Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(718): error: identifier "__float2int_rn" is undefined
2024-08-04T14:36:17.5898470Z { return __float2int_rn(__bfloat162float(h)); }
2024-08-04T14:36:17.5898988Z ^
2024-08-04T14:36:17.5899177Z
2024-08-04T14:36:17.5899779Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__int2float_ru" is undefined
2024-08-04T14:36:17.5901877Z { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5903361Z ^
2024-08-04T14:36:17.5903615Z
2024-08-04T14:36:17.5904241Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__int2float_rd" is undefined
2024-08-04T14:36:17.5906129Z { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5907640Z ^
2024-08-04T14:36:17.5908113Z
2024-08-04T14:36:17.5908707Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__int2float_rz" is undefined
2024-08-04T14:36:17.5910601Z { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5912085Z ^
2024-08-04T14:36:17.5912534Z
2024-08-04T14:36:17.5913121Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.5914983Z { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5916522Z ^
2024-08-04T14:36:17.5917010Z
2024-08-04T14:36:17.5917618Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.5920363Z { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5921986Z ^
2024-08-04T14:36:17.5922491Z
2024-08-04T14:36:17.5923140Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(884): error: identifier "__int2float_rz" is undefined
2024-08-04T14:36:17.5924263Z { return __float2bfloat16_rz(__int2float_rz(i)); }
2024-08-04T14:36:17.5924874Z ^
2024-08-04T14:36:17.5925165Z
2024-08-04T14:36:17.5925795Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(894): error: identifier "__int2float_rd" is undefined
2024-08-04T14:36:17.5926855Z { return __float2bfloat16_rd(__int2float_rd(i)); }
2024-08-04T14:36:17.5927433Z ^
2024-08-04T14:36:17.5927741Z
2024-08-04T14:36:17.5928353Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(905): error: identifier "__int2float_ru" is undefined
2024-08-04T14:36:17.5929439Z { return __float2bfloat16_ru(__int2float_ru(i)); }
2024-08-04T14:36:17.5930026Z ^
2024-08-04T14:36:17.5930327Z
2024-08-04T14:36:17.5930953Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1010): error: identifier "__int2float_rz" is undefined
2024-08-04T14:36:17.5932146Z { return __float2bfloat16_rz(__int2float_rz(static_cast<int>(i))); }
2024-08-04T14:36:17.5932844Z ^
2024-08-04T14:36:17.5933154Z
2024-08-04T14:36:17.5933850Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1020): error: identifier "__int2float_rd" is undefined
2024-08-04T14:36:17.5935037Z { return __float2bfloat16_rd(__int2float_rd(static_cast<int>(i))); }
2024-08-04T14:36:17.5935710Z ^
2024-08-04T14:36:17.5936035Z
2024-08-04T14:36:17.5936780Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1030): error: identifier "__int2float_ru" is undefined
2024-08-04T14:36:17.5937951Z { return __float2bfloat16_ru(__int2float_ru(static_cast<int>(i))); }
2024-08-04T14:36:17.5938642Z ^
2024-08-04T14:36:17.5938932Z
2024-08-04T14:36:17.5939515Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1041): error: identifier "__float2uint_rn" is undefined
2024-08-04T14:36:17.5940571Z { return __float2uint_rn(__bfloat162float(h)); }
2024-08-04T14:36:17.5941140Z ^
2024-08-04T14:36:17.5941354Z
2024-08-04T14:36:17.5942353Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1090): error: identifier "__float2uint_rd" is undefined
2024-08-04T14:36:17.5943739Z { return __float2uint_rd(__bfloat162float(h)); }
2024-08-04T14:36:17.5944179Z ^
2024-08-04T14:36:17.5944333Z
2024-08-04T14:36:17.5944832Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__uint2float_ru" is undefined
2024-08-04T14:36:17.5946034Z { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5946900Z ^
2024-08-04T14:36:17.5947055Z
2024-08-04T14:36:17.5947450Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__uint2float_rd" is undefined
2024-08-04T14:36:17.5948628Z { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5949526Z ^
2024-08-04T14:36:17.5949780Z
2024-08-04T14:36:17.5950155Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__uint2float_rz" is undefined
2024-08-04T14:36:17.5951425Z { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5952379Z ^
2024-08-04T14:36:17.5952805Z
2024-08-04T14:36:17.5953256Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.5954522Z { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5956280Z ^
2024-08-04T14:36:17.5956590Z
2024-08-04T14:36:17.5956974Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.5958119Z { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5959038Z ^
2024-08-04T14:36:17.5959338Z
2024-08-04T14:36:17.5959838Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1139): error: identifier "__uint2float_rz" is undefined
2024-08-04T14:36:17.5960577Z { return __float2bfloat16_rz(__uint2float_rz(i)); }
2024-08-04T14:36:17.5960927Z ^
2024-08-04T14:36:17.5961198Z
2024-08-04T14:36:17.5961567Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1149): error: identifier "__uint2float_rd" is undefined
2024-08-04T14:36:17.5962213Z { return __float2bfloat16_rd(__uint2float_rd(i)); }
2024-08-04T14:36:17.5962568Z ^
2024-08-04T14:36:17.5962795Z
2024-08-04T14:36:17.5963158Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1159): error: identifier "__uint2float_ru" is undefined
2024-08-04T14:36:17.5963792Z { return __float2bfloat16_ru(__uint2float_ru(i)); }
2024-08-04T14:36:17.5964148Z ^
2024-08-04T14:36:17.5964327Z
2024-08-04T14:36:17.5964682Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1264): error: identifier "__uint2float_rz" is undefined
2024-08-04T14:36:17.5965423Z { return __float2bfloat16_rz(__uint2float_rz(static_cast<unsigned int>(i))); }
2024-08-04T14:36:17.5965875Z ^
2024-08-04T14:36:17.5966051Z
2024-08-04T14:36:17.5966414Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1274): error: identifier "__uint2float_rd" is undefined
2024-08-04T14:36:17.5967138Z { return __float2bfloat16_rd(__uint2float_rd(static_cast<unsigned int>(i))); }
2024-08-04T14:36:17.5967581Z ^
2024-08-04T14:36:17.5967755Z
2024-08-04T14:36:17.5968119Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1284): error: identifier "__uint2float_ru" is undefined
2024-08-04T14:36:17.5968826Z { return __float2bfloat16_ru(__uint2float_ru(static_cast<unsigned int>(i))); }
2024-08-04T14:36:17.5969256Z ^
2024-08-04T14:36:17.5969426Z
2024-08-04T14:36:17.5969791Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1295): error: identifier "__float2ull_rn" is undefined
2024-08-04T14:36:17.5970399Z { return __float2ull_rn(__bfloat162float(h)); }
2024-08-04T14:36:17.5970729Z ^
2024-08-04T14:36:17.5970860Z
2024-08-04T14:36:17.5971218Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1307): error: identifier "__float2ull_rz" is undefined
2024-08-04T14:36:17.5971979Z { const float f = __bfloat162float(h); i = __float2ull_rz(f); }
2024-08-04T14:36:17.5972674Z ^
2024-08-04T14:36:17.5972951Z
2024-08-04T14:36:17.5973312Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1345): error: identifier "__float2ull_rd" is undefined
2024-08-04T14:36:17.5974119Z { return __float2ull_rd(__bfloat162float(h)); }
2024-08-04T14:36:17.5974452Z ^
2024-08-04T14:36:17.5974580Z
2024-08-04T14:36:17.5974938Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__ull2float_ru" is undefined
2024-08-04T14:36:17.5976054Z { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5976916Z ^
2024-08-04T14:36:17.5977065Z
2024-08-04T14:36:17.5977425Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__ull2float_rd" is undefined
2024-08-04T14:36:17.5978526Z { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5979376Z ^
2024-08-04T14:36:17.5979615Z
2024-08-04T14:36:17.5979974Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__ull2float_rz" is undefined
2024-08-04T14:36:17.5981370Z { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5982270Z ^
2024-08-04T14:36:17.5982577Z
2024-08-04T14:36:17.5983073Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.5984178Z { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5985122Z ^
2024-08-04T14:36:17.5985421Z
2024-08-04T14:36:17.5985783Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.5986885Z { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5987793Z ^
2024-08-04T14:36:17.5988077Z
2024-08-04T14:36:17.5988434Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1405): error: identifier "__ull2float_rz" is undefined
2024-08-04T14:36:17.5989061Z { return __float2bfloat16_rz(__ull2float_rz(i)); }
2024-08-04T14:36:17.5989414Z ^
2024-08-04T14:36:17.5989597Z
2024-08-04T14:36:17.5990088Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1415): error: identifier "__ull2float_rd" is undefined
2024-08-04T14:36:17.5990713Z { return __float2bfloat16_rd(__ull2float_rd(i)); }
2024-08-04T14:36:17.5991058Z ^
2024-08-04T14:36:17.5991239Z
2024-08-04T14:36:17.5991593Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1425): error: identifier "__ull2float_ru" is undefined
2024-08-04T14:36:17.5992205Z { return __float2bfloat16_ru(__ull2float_ru(i)); }
2024-08-04T14:36:17.5992552Z ^
2024-08-04T14:36:17.5992728Z
2024-08-04T14:36:17.5993091Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1435): error: identifier "__float2ll_rn" is undefined
2024-08-04T14:36:17.5993695Z { return __float2ll_rn(__bfloat162float(h)); }
2024-08-04T14:36:17.5994012Z ^
2024-08-04T14:36:17.5994136Z
2024-08-04T14:36:17.5994478Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1447): error: identifier "__float2ll_rz" is undefined
2024-08-04T14:36:17.5995194Z { const float f = __bfloat162float(h); i = __float2ll_rz(f); }
2024-08-04T14:36:17.5995605Z ^
2024-08-04T14:36:17.5995834Z
2024-08-04T14:36:17.5996184Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__ll2float_ru" is undefined
2024-08-04T14:36:17.5997303Z { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5998584Z ^
2024-08-04T14:36:17.5998820Z
2024-08-04T14:36:17.5999357Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__ll2float_rd" is undefined
2024-08-04T14:36:17.6000818Z { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.6001667Z ^
2024-08-04T14:36:17.6001903Z
2024-08-04T14:36:17.6002260Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__ll2float_rz" is undefined
2024-08-04T14:36:17.6003330Z { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.6004352Z ^
2024-08-04T14:36:17.6004780Z
2024-08-04T14:36:17.6005268Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.6006369Z { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.6007294Z ^
2024-08-04T14:36:17.6007584Z
2024-08-04T14:36:17.6007942Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6009036Z { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.6009915Z ^
2024-08-04T14:36:17.6010196Z
2024-08-04T14:36:17.6010548Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1549): error: identifier "__ll2float_rz" is undefined
2024-08-04T14:36:17.6011165Z { return __float2bfloat16_rz(__ll2float_rz(i)); }
2024-08-04T14:36:17.6011512Z ^
2024-08-04T14:36:17.6011690Z
2024-08-04T14:36:17.6012048Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1559): error: identifier "__ll2float_rd" is undefined
2024-08-04T14:36:17.6012642Z { return __float2bfloat16_rd(__ll2float_rd(i)); }
2024-08-04T14:36:17.6012969Z ^
2024-08-04T14:36:17.6013141Z
2024-08-04T14:36:17.6013501Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1569): error: identifier "__ll2float_ru" is undefined
2024-08-04T14:36:17.6014085Z { return __float2bfloat16_ru(__ll2float_ru(i)); }
2024-08-04T14:36:17.6014413Z ^
2024-08-04T14:36:17.6014580Z
2024-08-04T14:36:17.6014946Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(2968): error: identifier "__fmaf_ieee_rn" is undefined
2024-08-04T14:36:17.6015854Z { const float fa = __bfloat162float(a); const float fb = __bfloat162float(b); val = __float2bfloat16(__fmaf_ieee_rn(fa, 1.0f, fb)); }
2024-08-04T14:36:17.6016591Z ^
2024-08-04T14:36:17.6016875Z
2024-08-04T14:36:17.6017233Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(2981): error: identifier "__fmaf_ieee_rn" is undefined
2024-08-04T14:36:17.6018380Z { const float fa = __bfloat162float(a); const float fb = __bfloat162float(b); val = __float2bfloat16(__fmaf_ieee_rn(fb, -1.0f, fa)); }
2024-08-04T14:36:17.6019070Z ^
2024-08-04T14:36:17.6019335Z
2024-08-04T14:36:17.6020280Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(2994): error: identifier "__fmaf_ieee_rn" is undefined
2024-08-04T14:36:17.6021303Z { const float fa = __bfloat162float(a); const float fb = __bfloat162float(b); val = __float2bfloat16(__fmaf_ieee_rn(fa, fb, -0.0f)); }
2024-08-04T14:36:17.6022000Z ^
2024-08-04T14:36:17.6022274Z
2024-08-04T14:36:17.6022692Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3179): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6023344Z if ((__uint_as_float(0x00000001U) > 0.0f) || (f != 0.0f))
2024-08-04T14:36:17.6023691Z ^
2024-08-04T14:36:17.6023807Z
2024-08-04T14:36:17.6024175Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3209): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6024809Z const float log2e_up = __uint_as_float(0x3FB8AA3CU);
2024-08-04T14:36:17.6025166Z ^
2024-08-04T14:36:17.6025337Z
2024-08-04T14:36:17.6025752Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3285): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6026379Z const float log10_2 = __uint_as_float(0x40549A78U);
2024-08-04T14:36:17.6026722Z ^
2024-08-04T14:36:17.6026890Z
2024-08-04T14:36:17.6027239Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3299): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6029791Z { const float log10_2 = __uint_as_float(0x40549A78U); float fl = __low2float(a) * log10_2; asm("{ ex2.approx.f32 %0, %0; }" : "+f"(fl)); float fh = __high2float(a) * log10_2; asm("{ ex2.approx.f32 %0, %0; }" : "+f"(fh)); r = __floats2bfloat162_rn( fl, fh ); const __nv_bfloat162_raw araw = static_cast<__nv_bfloat162_raw>(a); if (araw.x == (unsigned short)0xBC95U) { __nv_bfloat16_raw raw_fix; raw_fix.x = (unsigned short)0x3f75U; r.x = static_cast<__nv_bfloat16>(raw_fix); } if (araw.y == (unsigned short)0xBC95U) { __nv_bfloat16_raw raw_fix; raw_fix.x = (unsigned short)0x3f75U; r.y = static_cast<__nv_bfloat16>(raw_fix); } }
2024-08-04T14:36:17.6031967Z ^
2024-08-04T14:36:17.6032132Z
2024-08-04T14:36:17.6032487Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3368): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6033117Z const float flt_ln2 = __uint_as_float(0x3f317218U);
2024-08-04T14:36:17.6033466Z ^
2024-08-04T14:36:17.6033633Z
2024-08-04T14:36:17.6033985Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3375): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6035333Z { const float flt_ln2 = __uint_as_float(0x3f317218U); float fl = __low2float(a); fl = __internal_device_fast_bf16log2(fl); fl = fl * flt_ln2; float fh = __high2float(a); fh = __internal_device_fast_bf16log2(fh); fh = fh * flt_ln2; return __floats2bfloat162_rn( fl, fh ); }
2024-08-04T14:36:17.6036389Z ^
2024-08-04T14:36:17.6036551Z
2024-08-04T14:36:17.6036917Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3410): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6037559Z const float flt_log10_2 = __uint_as_float(0x3E9A209BU);
2024-08-04T14:36:17.6037921Z ^
2024-08-04T14:36:17.6038105Z
2024-08-04T14:36:17.6038482Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3417): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6039909Z { const float flt_log10_2 = __uint_as_float(0x3E9A209BU); float fl = __low2float(a); fl = __internal_device_fast_bf16log2(fl); fl = fl * flt_log10_2; float fh = __high2float(a); fh = __internal_device_fast_bf16log2(fh); fh = fh * flt_log10_2; return __floats2bfloat162_rn( fl, fh ); }
2024-08-04T14:36:17.6041005Z ^
2024-08-04T14:36:17.6041180Z
2024-08-04T14:36:17.6041535Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3553): error: identifier "__fmaf_ieee_rn" is undefined
2024-08-04T14:36:17.6042456Z { const float fa = __bfloat162float(a); return __float2bfloat16(__fmaf_ieee_rn(fa, -1.0f, -0.0f)); }
2024-08-04T14:36:17.6043015Z ^
2024-08-04T14:36:17.6043266Z
2024-08-04T14:36:17.6043609Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3813): error: identifier "atomicCAS" is undefined
2024-08-04T14:36:17.6045207Z { unsigned int* address_as_uint = (unsigned int*)address; unsigned int old = *address_as_uint; unsigned int assumed; do { assumed = old; __nv_bfloat162 new_val = __hadd2(val, *(__nv_bfloat162*)&assumed); old = atomicCAS(address_as_uint, assumed, *(unsigned int*)&new_val); } while (assumed != old); return *(__nv_bfloat162*)&old; }
2024-08-04T14:36:17.6046643Z ^
2024-08-04T14:36:17.6046963Z
2024-08-04T14:36:17.6047305Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3835): error: identifier "atomicCAS" is undefined
2024-08-04T14:36:17.6048945Z { unsigned short int* address_as_us = (unsigned short int*)address; unsigned short int old = *address_as_us; unsigned short int assumed; do { assumed = old; old = atomicCAS(address_as_us, assumed, __bfloat16_as_ushort(__hadd(val, __ushort_as_bfloat16(assumed)))); } while (assumed != old); return __ushort_as_bfloat16(old); }
2024-08-04T14:36:17.6050406Z ^
2024-08-04T14:36:17.6050701Z
2024-08-04T14:36:17.6050706Z
2024-08-04T14:36:17.6050711Z
2024-08-04T14:36:17.6051185Z 65 errors detected in the compilation of "D:/a/TransformerEngine/TransformerEngine/transformer_engine/common/util/cast.cu".
2024-08-04T14:36:17.6051769Z
2024-08-04T14:36:17.6051837Z cast.cu
2024-08-04T14:36:17.6051949Z
2024-08-04T14:36:17.6052065Z ninja: build stopped: subcommand failed.
2024-08-04T14:36:17.6052461Z Traceback (most recent call last):
2024-08-04T14:36:17.6081472Z File "D:\a\TransformerEngine\TransformerEngine\build_tools\build_ext.py", line 88, in _build_cmake
2024-08-04T14:36:17.6082500Z subprocess.run(command, cwd=build_dir, check=True)
2024-08-04T14:36:17.6083074Z File "C:\Miniconda3\envs\build\lib\subprocess.py", line 526, in run
2024-08-04T14:36:17.6083569Z Building CMake extension transformer_engine
2024-08-04T14:36:17.6086941Z Running command C:\Program Files\CMake\bin\cmake.exe -S D:\a\TransformerEngine\TransformerEngine\transformer_engine\common -B D:\a\TransformerEngine\TransformerEngine\build\cmake -DPython_EXECUTABLE=C:\Miniconda3\envs\build\python.exe -DPython_INCLUDE_DIR=C:\Miniconda3\envs\build\Include -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=D:\a\TransformerEngine\TransformerEngine\build\lib.win-amd64-cpython-310 -Dpybind11_DIR=D:\a\TransformerEngine\TransformerEngine\.eggs\pybind11-2.13.1-py3.10.egg\pybind11\share\cmake\pybind11 -GNinja
2024-08-04T14:36:17.6090339Z Running command C:\Program Files\CMake\bin\cmake.exe --build D:\a\TransformerEngine\TransformerEngine\build\cmake --parallel 1
2024-08-04T14:36:17.6091151Z raise CalledProcessError(retcode, process.args,
2024-08-04T14:36:17.6092700Z subprocess.CalledProcessError: Command '['C:\\Program Files\\CMake\\bin\\cmake.exe', '--build', 'D:\\a\\TransformerEngine\\TransformerEngine\\build\\cmake', '--parallel', '1']' returned non-zero exit status 1.
Thanks for reporting these issues. We currently don't have plans for Windows support, although we welcome contributions.
Those error message are strange to me too since those symbols should be provided by NVCC. Google shows me some other MSVC users experiencing undefined CUDA intrinsics when including CUDA runtime headers:
I see we include CUDA runtime headers in a few .cu
files. For common/util/cast.cu
, I think it's coming from:
https://github.com/NVIDIA/TransformerEngine/blob/6717554f11f9b8bd79f917560e525d538c95b3bc/transformer_engine/common/include/transformer_engine/transformer_engine.h#L14
I wonder if it helps to wrap these includes in something like:
#if !defined(__CUDACC__)
#include <cuda_runtime_api.h>
#endif
I tried to change the code accordingly as you said and it still reports the same error. I also looked at cuda_bf16.hpp
, the header file that contains the nvcc built-in functions, and realized that they do not contain the runtime headers. I think the root cause is that nvcc is different under linux and windows.
Thanks for helping!
Trying to build
pytorch
ver. on github actions.I think there's hope that it will compile on windows, but I'm seeing errors in the error report about subscripts being out of bounds when expanding template functions during compilation.
Here is the detailed report: AnyaCoder.TransformerEngine.actions.run
Here is the workflow file: AnyaCoder.TransformerEngine.actions.workflow
Here are some of the crucial errors reported:
Here is the list of configurations.
OS:
windows-2022
python:3.10
cuda:12.4
cudnn :nvidia-cudnn-cu12==8.9.7.29
visualstudio2022buildtools:version=117.9.7.0
MAX_JOBS=4I wish someone would take a look.