NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.81k stars 299 forks source link

Problems trying to build on windows platforms #1076

Closed AnyaCoder closed 1 month ago

AnyaCoder commented 1 month ago

Trying to build pytorch ver. on github actions.

I think there's hope that it will compile on windows, but I'm seeing errors in the error report about subscripts being out of bounds when expanding template functions during compilation.

Here is the detailed report: AnyaCoder.TransformerEngine.actions.run

Here is the workflow file: AnyaCoder.TransformerEngine.actions.workflow

Here are some of the crucial errors reported:

running build_ext
-- The CUDA compiler identification is NVIDIA 12.4.99
-- The CXX compiler identification is MSVC 19.39.33523.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Miniconda3/envs/build/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: C:/Miniconda3/envs/build/include (found version "12.4.99")
-- cudnn found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn.lib.
-- Found LIBRARY: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/include
-- cuDNN: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn.lib
-- cuDNN: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/include
-- cudnn_adv_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_adv_infer.lib.
-- cudnn_adv_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_adv_train.lib.
-- cudnn_cnn_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_cnn_infer.lib.
-- cudnn_cnn_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_cnn_train.lib.
-- cudnn_ops_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_ops_infer.lib.
-- cudnn_ops_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_ops_train.lib.
-- Found Python: C:\Miniconda3\envs\build\python.exe (found version "3.10.14") found components: Interpreter Development.Module
-- Configuring done (16.1s)
-- Generating done (0.1s)
CMake Warning:
  Manually-specified variables were not used by the project:

    pybind11_DIR

-- Build files have been written to: D:/a/TransformerEngine/TransformerEngine/build/cmake
[1/33] Building CXX object CMakeFiles\transformer_engine.dir\pycudnn.cpp.obj
[2/33] Building CXX object CMakeFiles\transformer_engine.dir\transformer_engine.cpp.obj
FAILED: CMakeFiles/transformer_engine.dir/transformer_engine.cpp.obj 
C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\cl.exe  /nologo /TP -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\.. -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\include -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\..\..\3rdparty\cudnn-frontend\include -ID:\a\TransformerEngine\TransformerEngine\build\cmake\string_headers -external:IC:\Miniconda3\envs\build\include -external:W0 /DWIN32 /D_WINDOWS /EHsc /O2 /Ob2 /DNDEBUG -std:c++17 -MD /showIncludes /FoCMakeFiles\transformer_engine.dir\transformer_engine.cpp.obj /FdCMakeFiles\transformer_engine.dir\ /FS -c D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(103): warning C4297: 'nvte_tensor_amax': function assumed not to throw an exception but does
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(103): note: __declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(110): warning C4297: 'nvte_tensor_scale': function assumed not to throw an exception but does
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(110): note: __declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(117): warning C4297: 'nvte_tensor_scale_inv': function assumed not to throw an exception but does
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(117): note: __declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(646): error C2338: static_assert failed: 'tuple index out of bounds'
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(646): note: the template instantiation context (the oldest one first) is
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(14): note: see reference to class template instantiation 'transformer_engine::TypeInfo<T>' being compiled
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\common.h(111): note: see reference to function template instantiation 'transformer_engine::DType transformer_engine::TypeInfo<T>::getType<T>(void)' being compiled
.......... (expanding templates...)
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\common.h(93): note: see reference to class template instantiation 'std::tuple_element<7,transformer_engine::TypeInfo<T>::types>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(658): note: see reference to class template instantiation 'std::tuple_element<6,std::tuple<transformer_engine::int32,transformer_engine::fp32,transformer_engine::fp16,transformer_engine::bf16,transformer_engine::fp8e4m3,transformer_engine::fp8e5m2>>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(658): note: see reference to class template instantiation 'std::tuple_element<5,std::tuple<transformer_engine::fp32,transformer_engine::fp16,transformer_engine::bf16,transformer_engine::fp8e4m3,transformer_engine::fp8e5m2>>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(658): note: see reference to class template instantiation 'std::tuple_element<4,std::tuple<transformer_engine::fp16,transformer_engine::bf16,transformer_engine::fp8e4m3,transformer_engine::fp8e5m2>>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(658): note: see reference to class template instantiation 'std::tuple_element<3,std::tuple<transformer_engine::bf16,transformer_engine::fp8e4m3,transformer_engine::fp8e5m2>>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(658): note: see reference to class template instantiation 'std::tuple_element<2,std::tuple<transformer_engine::fp8e4m3,transformer_engine::fp8e5m2>>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(658): note: see reference to class template instantiation 'std::tuple_element<1,std::tuple<transformer_engine::fp8e5m2>>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(658): note: see reference to class template instantiation 'std::tuple_element<0,std::tuple<>>' being compiled
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\common.h(93): error C2039: 'type': is not a member of 'std::tuple_element<7,transformer_engine::TypeInfo<T>::types>'
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\include\utility(156): note: see declaration of 'std::tuple_element<7,transformer_engine::TypeInfo<T>::types>'
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\common.h(93): error C2146: syntax error: missing '>' before identifier 'type'
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\common.h(93): error C2039: 'value': is not a member of '`global namespace''
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\common.h(93): error C2059: syntax error: ')'
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\common.h(93): error C2143: syntax error: missing ';' before '{'
D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\common.h(95): error C2181: illegal else without matching if

Here is the list of configurations.

OS:windows-2022 python: 3.10 cuda: 12.4 cudnn : nvidia-cudnn-cu12==8.9.7.29 visualstudio2022buildtools: version=117.9.7.0 MAX_JOBS=4

I wish someone would take a look.

### Tasks
AnyaCoder commented 1 month ago

OK, I made some modifications that it seems to work. But I faced another issue caused by the upstream repo cudnn-frontend. Maybe I can ask the upstream repository to add the compatibility library dlfcn-win32link

modified code workflow

[9/33] Building CUDA object CMakeFiles\transformer_engine.dir\fused_attn\fused_attn_f16_max512_seqlen.cu.obj
FAILED: CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.obj 
C:\Miniconda3\envs\build\bin\nvcc.exe -forward-unknown-to-host-compiler -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\.. -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\include -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\..\..\3rdparty\cudnn-frontend\include -ID:\a\TransformerEngine\TransformerEngine\build\cmake\string_headers -isystem C:\Miniconda3\envs\build\include -D_WINDOWS -Xcompiler=" /EHsc" --expt-relaxed-constexpr -O3 -Xcompiler="-O2 -Ob2" -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-MD -MD -MT CMakeFiles\transformer_engine.dir\fused_attn\fused_attn_f16_max512_seqlen.cu.obj -MF CMakeFiles\transformer_engine.dir\fused_attn\fused_attn_f16_max512_seqlen.cu.obj.d -x cu -c D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\fused_attn\fused_attn_f16_max512_seqlen.cu -o CMakeFiles\transformer_engine.dir\fused_attn\fused_attn_f16_max512_seqlen.cu.obj -Xcompiler=-FdCMakeFiles\transformer_engine.dir\,-FS
D:\a\TransformerEngine\TransformerEngine\3rdparty\cudnn-frontend\include\cudnn_frontend_shim.h(28): fatal error C1083: Cannot open include file: 'dlfcn.h': No such file or directory

fused_attn_f16_max512_seqlen.cu

ninja: build stopped: subcommand failed.
Building CMake extension transformer_engine
Running command C:\Program Files\CMake\bin\cmake.exe -S D:\a\TransformerEngine\TransformerEngine\transformer_engine\common -B D:\a\TransformerEngine\TransformerEngine\build\cmake -DPython_EXECUTABLE=C:\Miniconda3\envs\build\python.exe -DPython_INCLUDE_DIR=C:\Miniconda3\envs\build\Include -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=D:\a\TransformerEngine\TransformerEngine\build\lib.win-amd64-cpython-310 -Dpybind11_DIR=D:\a\TransformerEngine\TransformerEngine\.eggs\pybind11-2.13.1-py3.10.egg\pybind11\share\cmake\pybind11 -GNinja
Running command C:\Program Files\CMake\bin\cmake.exe --build D:\a\TransformerEngine\TransformerEngine\build\cmake --parallel 1
AnyaCoder commented 1 month ago

After struggling, I was stucked at the functions in nvcc compiler https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__CAST.html Could anyone help?

2024-08-04T13:57:38.0152640Z running build_ext
2024-08-04T13:57:51.1267194Z -- The CUDA compiler identification is NVIDIA 12.4.99
2024-08-04T13:57:51.5210896Z -- The CXX compiler identification is MSVC 19.39.33523.0
2024-08-04T13:57:51.6432579Z -- Detecting CUDA compiler ABI info
2024-08-04T13:57:56.4257080Z -- Detecting CUDA compiler ABI info - done
2024-08-04T13:57:56.4607149Z -- Check for working CUDA compiler: C:/Miniconda3/envs/build/bin/nvcc.exe - skipped
2024-08-04T13:57:57.3469432Z -- Detecting CUDA compile features
2024-08-04T13:57:57.3475842Z -- Detecting CUDA compile features - done
2024-08-04T13:57:57.3737106Z -- Detecting CXX compiler ABI info
2024-08-04T13:57:57.9295720Z -- Detecting CXX compiler ABI info - done
2024-08-04T13:57:57.9419836Z -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
2024-08-04T13:57:57.9423682Z -- Detecting CXX compile features
2024-08-04T13:57:57.9439766Z -- Detecting CXX compile features - done
2024-08-04T13:57:58.0638285Z -- Found CUDAToolkit: C:/Miniconda3/envs/build/include (found version "12.4.99")
2024-08-04T13:57:59.3327426Z -- cudnn found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn.lib.
2024-08-04T13:57:59.3337712Z -- Found LIBRARY: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/include
2024-08-04T13:57:59.3338849Z -- cuDNN: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn.lib
2024-08-04T13:57:59.3339612Z -- cuDNN: C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/include
2024-08-04T13:57:59.3346771Z -- cudnn_adv_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_adv_infer.lib.
2024-08-04T13:57:59.3352302Z -- cudnn_adv_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_adv_train.lib.
2024-08-04T13:57:59.3358841Z -- cudnn_cnn_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_cnn_infer.lib.
2024-08-04T13:57:59.3364112Z -- cudnn_cnn_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_cnn_train.lib.
2024-08-04T13:57:59.3371710Z -- cudnn_ops_infer found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_ops_infer.lib.
2024-08-04T13:57:59.3376887Z -- cudnn_ops_train found at C:/Miniconda3/envs/build/Lib/site-packages/nvidia/cudnn/lib/x64/cudnn_ops_train.lib.
2024-08-04T13:58:00.1166419Z -- Found Python: C:\Miniconda3\envs\build\python.exe (found version "3.10.14") found components: Interpreter Development.Module
2024-08-04T13:58:00.1256975Z -- Configuring done (22.1s)
2024-08-04T13:58:00.2562336Z CMake Warning:
2024-08-04T13:58:00.2563016Z -- Generating done (0.1s)
2024-08-04T13:58:00.2563602Z   Manually-specified variables were not used by the project:
2024-08-04T13:58:00.2563937Z 
2024-08-04T13:58:00.2564020Z     pybind11_DIR
2024-08-04T13:58:00.2564176Z 
2024-08-04T13:58:00.2564181Z 
2024-08-04T13:58:00.2573429Z -- Build files have been written to: D:/a/TransformerEngine/TransformerEngine/build/cmake
2024-08-04T13:58:10.0792975Z [1/33] Building CXX object CMakeFiles\transformer_engine.dir\pycudnn.cpp.obj
2024-08-04T13:58:10.0794795Z [2/33] Building CXX object CMakeFiles\transformer_engine.dir\transformer_engine.cpp.obj
2024-08-04T13:58:10.0797032Z D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\transformer_engine.cpp(103):

...

2024-08-04T14:36:17.5863062Z 
2024-08-04T14:36:17.5863550Z tmpxft_0000077c_00000000-10_rmsnorm_fwd_cuda_kernel.compute_90.cudafe1.cpp
2024-08-04T14:36:17.5863916Z 
2024-08-04T14:36:17.5864180Z [23/33] Building CUDA object CMakeFiles\transformer_engine.dir\util\cast.cu.obj
2024-08-04T14:36:17.5864767Z FAILED: CMakeFiles/transformer_engine.dir/util/cast.cu.obj 
2024-08-04T14:36:17.5873209Z C:\Miniconda3\envs\build\bin\nvcc.exe -forward-unknown-to-host-compiler -DNOMINMAX -Dtransformer_engine_EXPORTS -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\.. -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\include -ID:\a\TransformerEngine\TransformerEngine\transformer_engine\common\..\..\3rdparty\cudnn-frontend\include -ID:\a\TransformerEngine\TransformerEngine\build\cmake\string_headers -isystem C:\Miniconda3\envs\build\include -D_WINDOWS -Xcompiler=" /EHsc" --expt-relaxed-constexpr -O3 -Xcompiler="-O2 -Ob2" -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-MD -MD -MT CMakeFiles\transformer_engine.dir\util\cast.cu.obj -MF CMakeFiles\transformer_engine.dir\util\cast.cu.obj.d -x cu -c D:\a\TransformerEngine\TransformerEngine\transformer_engine\common\util\cast.cu -o CMakeFiles\transformer_engine.dir\util\cast.cu.obj -Xcompiler=-FdCMakeFiles\transformer_engine.dir\,-FS
2024-08-04T14:36:17.5882254Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(2944): error: identifier "__fmaf_rn" is undefined
2024-08-04T14:36:17.5883343Z       const float ar = __fmaf_rn(a, 0.636619772F, 12582912.0F);
2024-08-04T14:36:17.5883947Z                        ^
2024-08-04T14:36:17.5884344Z 
2024-08-04T14:36:17.5884931Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(2945): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.5885893Z       const unsigned q = __float_as_uint(ar);
2024-08-04T14:36:17.5886361Z                          ^
2024-08-04T14:36:17.5886598Z 
2024-08-04T14:36:17.5887127Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(2946): error: identifier "__fsub_rn" is undefined
2024-08-04T14:36:17.5888033Z       const float j = __fsub_rn(ar, 12582912.0F);
2024-08-04T14:36:17.5888608Z                       ^
2024-08-04T14:36:17.5888831Z 
2024-08-04T14:36:17.5889371Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(2982): error: identifier "__fmaf_rn" is undefined
2024-08-04T14:36:17.5890230Z       z = __fmaf_rn(a8, x2, a6);
2024-08-04T14:36:17.5890621Z           ^
2024-08-04T14:36:17.5890792Z 
2024-08-04T14:36:17.5891318Z C:\Miniconda3\envs\build\include\cuda_fp16.hpp(3471): error: identifier "atomicCAS" is undefined
2024-08-04T14:36:17.5893861Z   { unsigned int* address_as_uint = (unsigned int*)address; unsigned int old = *address_as_uint; unsigned int assumed; do { assumed = old; __half2 new_val = __hadd2(val, *(__half2*)&assumed); old = atomicCAS(address_as_uint, assumed, *(unsigned int*)&new_val); } while (assumed != old); return *(__half2*)&old; }
2024-08-04T14:36:17.5896281Z                                                                                                                                                                                                       ^
2024-08-04T14:36:17.5896806Z 
2024-08-04T14:36:17.5897506Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(718): error: identifier "__float2int_rn" is undefined
2024-08-04T14:36:17.5898470Z   { return __float2int_rn(__bfloat162float(h)); }
2024-08-04T14:36:17.5898988Z            ^
2024-08-04T14:36:17.5899177Z 
2024-08-04T14:36:17.5899779Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__int2float_ru" is undefined
2024-08-04T14:36:17.5901877Z   { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5903361Z                      ^
2024-08-04T14:36:17.5903615Z 
2024-08-04T14:36:17.5904241Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__int2float_rd" is undefined
2024-08-04T14:36:17.5906129Z   { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5907640Z                                                          ^
2024-08-04T14:36:17.5908113Z 
2024-08-04T14:36:17.5908707Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__int2float_rz" is undefined
2024-08-04T14:36:17.5910601Z   { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5912085Z                                                                                        ^
2024-08-04T14:36:17.5912534Z 
2024-08-04T14:36:17.5913121Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.5914983Z   { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5916522Z                                                                                                                                                ^
2024-08-04T14:36:17.5917010Z 
2024-08-04T14:36:17.5917618Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(791): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.5920363Z   { const float ru = __int2float_ru(i); const float rd = __int2float_rd(i); float rz = __int2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5921986Z                                                                                                                                ^
2024-08-04T14:36:17.5922491Z 
2024-08-04T14:36:17.5923140Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(884): error: identifier "__int2float_rz" is undefined
2024-08-04T14:36:17.5924263Z   { return __float2bfloat16_rz(__int2float_rz(i)); }
2024-08-04T14:36:17.5924874Z                                ^
2024-08-04T14:36:17.5925165Z 
2024-08-04T14:36:17.5925795Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(894): error: identifier "__int2float_rd" is undefined
2024-08-04T14:36:17.5926855Z   { return __float2bfloat16_rd(__int2float_rd(i)); }
2024-08-04T14:36:17.5927433Z                                ^
2024-08-04T14:36:17.5927741Z 
2024-08-04T14:36:17.5928353Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(905): error: identifier "__int2float_ru" is undefined
2024-08-04T14:36:17.5929439Z   { return __float2bfloat16_ru(__int2float_ru(i)); }
2024-08-04T14:36:17.5930026Z                                ^
2024-08-04T14:36:17.5930327Z 
2024-08-04T14:36:17.5930953Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1010): error: identifier "__int2float_rz" is undefined
2024-08-04T14:36:17.5932146Z   { return __float2bfloat16_rz(__int2float_rz(static_cast<int>(i))); }
2024-08-04T14:36:17.5932844Z                                ^
2024-08-04T14:36:17.5933154Z 
2024-08-04T14:36:17.5933850Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1020): error: identifier "__int2float_rd" is undefined
2024-08-04T14:36:17.5935037Z   { return __float2bfloat16_rd(__int2float_rd(static_cast<int>(i))); }
2024-08-04T14:36:17.5935710Z                                ^
2024-08-04T14:36:17.5936035Z 
2024-08-04T14:36:17.5936780Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1030): error: identifier "__int2float_ru" is undefined
2024-08-04T14:36:17.5937951Z   { return __float2bfloat16_ru(__int2float_ru(static_cast<int>(i))); }
2024-08-04T14:36:17.5938642Z                                ^
2024-08-04T14:36:17.5938932Z 
2024-08-04T14:36:17.5939515Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1041): error: identifier "__float2uint_rn" is undefined
2024-08-04T14:36:17.5940571Z   { return __float2uint_rn(__bfloat162float(h)); }
2024-08-04T14:36:17.5941140Z            ^
2024-08-04T14:36:17.5941354Z 
2024-08-04T14:36:17.5942353Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1090): error: identifier "__float2uint_rd" is undefined
2024-08-04T14:36:17.5943739Z   { return __float2uint_rd(__bfloat162float(h)); }
2024-08-04T14:36:17.5944179Z            ^
2024-08-04T14:36:17.5944333Z 
2024-08-04T14:36:17.5944832Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__uint2float_ru" is undefined
2024-08-04T14:36:17.5946034Z   { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5946900Z                      ^
2024-08-04T14:36:17.5947055Z 
2024-08-04T14:36:17.5947450Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__uint2float_rd" is undefined
2024-08-04T14:36:17.5948628Z   { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5949526Z                                                           ^
2024-08-04T14:36:17.5949780Z 
2024-08-04T14:36:17.5950155Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__uint2float_rz" is undefined
2024-08-04T14:36:17.5951425Z   { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5952379Z                                                                                          ^
2024-08-04T14:36:17.5952805Z 
2024-08-04T14:36:17.5953256Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.5954522Z   { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5956280Z                                                                                                                                                   ^
2024-08-04T14:36:17.5956590Z 
2024-08-04T14:36:17.5956974Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1112): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.5958119Z   { const float ru = __uint2float_ru(i); const float rd = __uint2float_rd(i); float rz = __uint2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5959038Z                                                                                                                                   ^
2024-08-04T14:36:17.5959338Z 
2024-08-04T14:36:17.5959838Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1139): error: identifier "__uint2float_rz" is undefined
2024-08-04T14:36:17.5960577Z   { return __float2bfloat16_rz(__uint2float_rz(i)); }
2024-08-04T14:36:17.5960927Z                                ^
2024-08-04T14:36:17.5961198Z 
2024-08-04T14:36:17.5961567Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1149): error: identifier "__uint2float_rd" is undefined
2024-08-04T14:36:17.5962213Z   { return __float2bfloat16_rd(__uint2float_rd(i)); }
2024-08-04T14:36:17.5962568Z                                ^
2024-08-04T14:36:17.5962795Z 
2024-08-04T14:36:17.5963158Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1159): error: identifier "__uint2float_ru" is undefined
2024-08-04T14:36:17.5963792Z   { return __float2bfloat16_ru(__uint2float_ru(i)); }
2024-08-04T14:36:17.5964148Z                                ^
2024-08-04T14:36:17.5964327Z 
2024-08-04T14:36:17.5964682Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1264): error: identifier "__uint2float_rz" is undefined
2024-08-04T14:36:17.5965423Z   { return __float2bfloat16_rz(__uint2float_rz(static_cast<unsigned int>(i))); }
2024-08-04T14:36:17.5965875Z                                ^
2024-08-04T14:36:17.5966051Z 
2024-08-04T14:36:17.5966414Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1274): error: identifier "__uint2float_rd" is undefined
2024-08-04T14:36:17.5967138Z   { return __float2bfloat16_rd(__uint2float_rd(static_cast<unsigned int>(i))); }
2024-08-04T14:36:17.5967581Z                                ^
2024-08-04T14:36:17.5967755Z 
2024-08-04T14:36:17.5968119Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1284): error: identifier "__uint2float_ru" is undefined
2024-08-04T14:36:17.5968826Z   { return __float2bfloat16_ru(__uint2float_ru(static_cast<unsigned int>(i))); }
2024-08-04T14:36:17.5969256Z                                ^
2024-08-04T14:36:17.5969426Z 
2024-08-04T14:36:17.5969791Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1295): error: identifier "__float2ull_rn" is undefined
2024-08-04T14:36:17.5970399Z   { return __float2ull_rn(__bfloat162float(h)); }
2024-08-04T14:36:17.5970729Z            ^
2024-08-04T14:36:17.5970860Z 
2024-08-04T14:36:17.5971218Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1307): error: identifier "__float2ull_rz" is undefined
2024-08-04T14:36:17.5971979Z   { const float f = __bfloat162float(h); i = __float2ull_rz(f); }
2024-08-04T14:36:17.5972674Z                                              ^
2024-08-04T14:36:17.5972951Z 
2024-08-04T14:36:17.5973312Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1345): error: identifier "__float2ull_rd" is undefined
2024-08-04T14:36:17.5974119Z   { return __float2ull_rd(__bfloat162float(h)); }
2024-08-04T14:36:17.5974452Z            ^
2024-08-04T14:36:17.5974580Z 
2024-08-04T14:36:17.5974938Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__ull2float_ru" is undefined
2024-08-04T14:36:17.5976054Z   { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5976916Z                      ^
2024-08-04T14:36:17.5977065Z 
2024-08-04T14:36:17.5977425Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__ull2float_rd" is undefined
2024-08-04T14:36:17.5978526Z   { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5979376Z                                                          ^
2024-08-04T14:36:17.5979615Z 
2024-08-04T14:36:17.5979974Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__ull2float_rz" is undefined
2024-08-04T14:36:17.5981370Z   { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5982270Z                                                                                        ^
2024-08-04T14:36:17.5982577Z 
2024-08-04T14:36:17.5983073Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.5984178Z   { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5985122Z                                                                                                                                                ^
2024-08-04T14:36:17.5985421Z 
2024-08-04T14:36:17.5985783Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1367): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.5986885Z   { const float ru = __ull2float_ru(i); const float rd = __ull2float_rd(i); float rz = __ull2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5987793Z                                                                                                                                ^
2024-08-04T14:36:17.5988077Z 
2024-08-04T14:36:17.5988434Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1405): error: identifier "__ull2float_rz" is undefined
2024-08-04T14:36:17.5989061Z   { return __float2bfloat16_rz(__ull2float_rz(i)); }
2024-08-04T14:36:17.5989414Z                                ^
2024-08-04T14:36:17.5989597Z 
2024-08-04T14:36:17.5990088Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1415): error: identifier "__ull2float_rd" is undefined
2024-08-04T14:36:17.5990713Z   { return __float2bfloat16_rd(__ull2float_rd(i)); }
2024-08-04T14:36:17.5991058Z                                ^
2024-08-04T14:36:17.5991239Z 
2024-08-04T14:36:17.5991593Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1425): error: identifier "__ull2float_ru" is undefined
2024-08-04T14:36:17.5992205Z   { return __float2bfloat16_ru(__ull2float_ru(i)); }
2024-08-04T14:36:17.5992552Z                                ^
2024-08-04T14:36:17.5992728Z 
2024-08-04T14:36:17.5993091Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1435): error: identifier "__float2ll_rn" is undefined
2024-08-04T14:36:17.5993695Z   { return __float2ll_rn(__bfloat162float(h)); }
2024-08-04T14:36:17.5994012Z            ^
2024-08-04T14:36:17.5994136Z 
2024-08-04T14:36:17.5994478Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1447): error: identifier "__float2ll_rz" is undefined
2024-08-04T14:36:17.5995194Z   { const float f = __bfloat162float(h); i = __float2ll_rz(f); }
2024-08-04T14:36:17.5995605Z                                              ^
2024-08-04T14:36:17.5995834Z 
2024-08-04T14:36:17.5996184Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__ll2float_ru" is undefined
2024-08-04T14:36:17.5997303Z   { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.5998584Z                      ^
2024-08-04T14:36:17.5998820Z 
2024-08-04T14:36:17.5999357Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__ll2float_rd" is undefined
2024-08-04T14:36:17.6000818Z   { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.6001667Z                                                         ^
2024-08-04T14:36:17.6001903Z 
2024-08-04T14:36:17.6002260Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__ll2float_rz" is undefined
2024-08-04T14:36:17.6003330Z   { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.6004352Z                                                                                      ^
2024-08-04T14:36:17.6004780Z 
2024-08-04T14:36:17.6005268Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__float_as_uint" is undefined
2024-08-04T14:36:17.6006369Z   { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.6007294Z                                                                                                                                             ^
2024-08-04T14:36:17.6007584Z 
2024-08-04T14:36:17.6007942Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1508): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6009036Z   { const float ru = __ll2float_ru(i); const float rd = __ll2float_rd(i); float rz = __ll2float_rz(i); if (ru != rd) { rz = __uint_as_float(__float_as_uint(rz) | 1U); } return __float2bfloat16_rn(rz); }
2024-08-04T14:36:17.6009915Z                                                                                                                             ^
2024-08-04T14:36:17.6010196Z 
2024-08-04T14:36:17.6010548Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1549): error: identifier "__ll2float_rz" is undefined
2024-08-04T14:36:17.6011165Z   { return __float2bfloat16_rz(__ll2float_rz(i)); }
2024-08-04T14:36:17.6011512Z                                ^
2024-08-04T14:36:17.6011690Z 
2024-08-04T14:36:17.6012048Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1559): error: identifier "__ll2float_rd" is undefined
2024-08-04T14:36:17.6012642Z   { return __float2bfloat16_rd(__ll2float_rd(i)); }
2024-08-04T14:36:17.6012969Z                                ^
2024-08-04T14:36:17.6013141Z 
2024-08-04T14:36:17.6013501Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(1569): error: identifier "__ll2float_ru" is undefined
2024-08-04T14:36:17.6014085Z   { return __float2bfloat16_ru(__ll2float_ru(i)); }
2024-08-04T14:36:17.6014413Z                                ^
2024-08-04T14:36:17.6014580Z 
2024-08-04T14:36:17.6014946Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(2968): error: identifier "__fmaf_ieee_rn" is undefined
2024-08-04T14:36:17.6015854Z   { const float fa = __bfloat162float(a); const float fb = __bfloat162float(b); val = __float2bfloat16(__fmaf_ieee_rn(fa, 1.0f, fb)); }
2024-08-04T14:36:17.6016591Z                                                                                                        ^
2024-08-04T14:36:17.6016875Z 
2024-08-04T14:36:17.6017233Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(2981): error: identifier "__fmaf_ieee_rn" is undefined
2024-08-04T14:36:17.6018380Z   { const float fa = __bfloat162float(a); const float fb = __bfloat162float(b); val = __float2bfloat16(__fmaf_ieee_rn(fb, -1.0f, fa)); }
2024-08-04T14:36:17.6019070Z                                                                                                        ^
2024-08-04T14:36:17.6019335Z 
2024-08-04T14:36:17.6020280Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(2994): error: identifier "__fmaf_ieee_rn" is undefined
2024-08-04T14:36:17.6021303Z   { const float fa = __bfloat162float(a); const float fb = __bfloat162float(b); val = __float2bfloat16(__fmaf_ieee_rn(fa, fb, -0.0f)); }
2024-08-04T14:36:17.6022000Z                                                                                                        ^
2024-08-04T14:36:17.6022274Z 
2024-08-04T14:36:17.6022692Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3179): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6023344Z       if ((__uint_as_float(0x00000001U) > 0.0f) || (f != 0.0f))
2024-08-04T14:36:17.6023691Z            ^
2024-08-04T14:36:17.6023807Z 
2024-08-04T14:36:17.6024175Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3209): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6024809Z       const float log2e_up = __uint_as_float(0x3FB8AA3CU);
2024-08-04T14:36:17.6025166Z                              ^
2024-08-04T14:36:17.6025337Z 
2024-08-04T14:36:17.6025752Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3285): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6026379Z       const float log10_2 = __uint_as_float(0x40549A78U);
2024-08-04T14:36:17.6026722Z                             ^
2024-08-04T14:36:17.6026890Z 
2024-08-04T14:36:17.6027239Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3299): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6029791Z   { const float log10_2 = __uint_as_float(0x40549A78U); float fl = __low2float(a) * log10_2; asm("{ ex2.approx.f32 %0, %0; }" : "+f"(fl)); float fh = __high2float(a) * log10_2; asm("{ ex2.approx.f32 %0, %0; }" : "+f"(fh)); r = __floats2bfloat162_rn( fl, fh ); const __nv_bfloat162_raw araw = static_cast<__nv_bfloat162_raw>(a); if (araw.x == (unsigned short)0xBC95U) { __nv_bfloat16_raw raw_fix; raw_fix.x = (unsigned short)0x3f75U; r.x = static_cast<__nv_bfloat16>(raw_fix); } if (araw.y == (unsigned short)0xBC95U) { __nv_bfloat16_raw raw_fix; raw_fix.x = (unsigned short)0x3f75U; r.y = static_cast<__nv_bfloat16>(raw_fix); } }
2024-08-04T14:36:17.6031967Z                           ^
2024-08-04T14:36:17.6032132Z 
2024-08-04T14:36:17.6032487Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3368): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6033117Z       const float flt_ln2 = __uint_as_float(0x3f317218U);
2024-08-04T14:36:17.6033466Z                             ^
2024-08-04T14:36:17.6033633Z 
2024-08-04T14:36:17.6033985Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3375): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6035333Z   { const float flt_ln2 = __uint_as_float(0x3f317218U); float fl = __low2float(a); fl = __internal_device_fast_bf16log2(fl); fl = fl * flt_ln2; float fh = __high2float(a); fh = __internal_device_fast_bf16log2(fh); fh = fh * flt_ln2; return __floats2bfloat162_rn( fl, fh ); }
2024-08-04T14:36:17.6036389Z                           ^
2024-08-04T14:36:17.6036551Z 
2024-08-04T14:36:17.6036917Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3410): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6037559Z       const float flt_log10_2 = __uint_as_float(0x3E9A209BU);
2024-08-04T14:36:17.6037921Z                                 ^
2024-08-04T14:36:17.6038105Z 
2024-08-04T14:36:17.6038482Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3417): error: identifier "__uint_as_float" is undefined
2024-08-04T14:36:17.6039909Z   { const float flt_log10_2 = __uint_as_float(0x3E9A209BU); float fl = __low2float(a); fl = __internal_device_fast_bf16log2(fl); fl = fl * flt_log10_2; float fh = __high2float(a); fh = __internal_device_fast_bf16log2(fh); fh = fh * flt_log10_2; return __floats2bfloat162_rn( fl, fh ); }
2024-08-04T14:36:17.6041005Z                               ^
2024-08-04T14:36:17.6041180Z 
2024-08-04T14:36:17.6041535Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3553): error: identifier "__fmaf_ieee_rn" is undefined
2024-08-04T14:36:17.6042456Z   { const float fa = __bfloat162float(a); return __float2bfloat16(__fmaf_ieee_rn(fa, -1.0f, -0.0f)); }
2024-08-04T14:36:17.6043015Z                                                                   ^
2024-08-04T14:36:17.6043266Z 
2024-08-04T14:36:17.6043609Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3813): error: identifier "atomicCAS" is undefined
2024-08-04T14:36:17.6045207Z   { unsigned int* address_as_uint = (unsigned int*)address; unsigned int old = *address_as_uint; unsigned int assumed; do { assumed = old; __nv_bfloat162 new_val = __hadd2(val, *(__nv_bfloat162*)&assumed); old = atomicCAS(address_as_uint, assumed, *(unsigned int*)&new_val); } while (assumed != old); return *(__nv_bfloat162*)&old; }
2024-08-04T14:36:17.6046643Z                                                                                                                                                                                                                     ^
2024-08-04T14:36:17.6046963Z 
2024-08-04T14:36:17.6047305Z C:\Miniconda3\envs\build\include\cuda_bf16.hpp(3835): error: identifier "atomicCAS" is undefined
2024-08-04T14:36:17.6048945Z   { unsigned short int* address_as_us = (unsigned short int*)address; unsigned short int old = *address_as_us; unsigned short int assumed; do { assumed = old; old = atomicCAS(address_as_us, assumed, __bfloat16_as_ushort(__hadd(val, __ushort_as_bfloat16(assumed)))); } while (assumed != old); return __ushort_as_bfloat16(old); }
2024-08-04T14:36:17.6050406Z                                                                                                                                                                      ^
2024-08-04T14:36:17.6050701Z 
2024-08-04T14:36:17.6050706Z 
2024-08-04T14:36:17.6050711Z 
2024-08-04T14:36:17.6051185Z 65 errors detected in the compilation of "D:/a/TransformerEngine/TransformerEngine/transformer_engine/common/util/cast.cu".
2024-08-04T14:36:17.6051769Z 
2024-08-04T14:36:17.6051837Z cast.cu
2024-08-04T14:36:17.6051949Z 
2024-08-04T14:36:17.6052065Z ninja: build stopped: subcommand failed.
2024-08-04T14:36:17.6052461Z Traceback (most recent call last):
2024-08-04T14:36:17.6081472Z   File "D:\a\TransformerEngine\TransformerEngine\build_tools\build_ext.py", line 88, in _build_cmake
2024-08-04T14:36:17.6082500Z     subprocess.run(command, cwd=build_dir, check=True)
2024-08-04T14:36:17.6083074Z   File "C:\Miniconda3\envs\build\lib\subprocess.py", line 526, in run
2024-08-04T14:36:17.6083569Z Building CMake extension transformer_engine
2024-08-04T14:36:17.6086941Z Running command C:\Program Files\CMake\bin\cmake.exe -S D:\a\TransformerEngine\TransformerEngine\transformer_engine\common -B D:\a\TransformerEngine\TransformerEngine\build\cmake -DPython_EXECUTABLE=C:\Miniconda3\envs\build\python.exe -DPython_INCLUDE_DIR=C:\Miniconda3\envs\build\Include -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=D:\a\TransformerEngine\TransformerEngine\build\lib.win-amd64-cpython-310 -Dpybind11_DIR=D:\a\TransformerEngine\TransformerEngine\.eggs\pybind11-2.13.1-py3.10.egg\pybind11\share\cmake\pybind11 -GNinja
2024-08-04T14:36:17.6090339Z Running command C:\Program Files\CMake\bin\cmake.exe --build D:\a\TransformerEngine\TransformerEngine\build\cmake --parallel 1
2024-08-04T14:36:17.6091151Z     raise CalledProcessError(retcode, process.args,
2024-08-04T14:36:17.6092700Z subprocess.CalledProcessError: Command '['C:\\Program Files\\CMake\\bin\\cmake.exe', '--build', 'D:\\a\\TransformerEngine\\TransformerEngine\\build\\cmake', '--parallel', '1']' returned non-zero exit status 1.
timmoon10 commented 1 month ago

Thanks for reporting these issues. We currently don't have plans for Windows support, although we welcome contributions.

Those error message are strange to me too since those symbols should be provided by NVCC. Google shows me some other MSVC users experiencing undefined CUDA intrinsics when including CUDA runtime headers:

I see we include CUDA runtime headers in a few .cu files. For common/util/cast.cu, I think it's coming from: https://github.com/NVIDIA/TransformerEngine/blob/6717554f11f9b8bd79f917560e525d538c95b3bc/transformer_engine/common/include/transformer_engine/transformer_engine.h#L14

I wonder if it helps to wrap these includes in something like:

#if !defined(__CUDACC__)
#include <cuda_runtime_api.h>
#endif
AnyaCoder commented 1 month ago

I tried to change the code accordingly as you said and it still reports the same error. I also looked at cuda_bf16.hpp, the header file that contains the nvcc built-in functions, and realized that they do not contain the runtime headers. I think the root cause is that nvcc is different under linux and windows.

AnyaCoder commented 1 month ago

Thanks for helping!