Open wrench1997 opened 1 month ago
I seem to have succeeded, but there are many problems. The core lies in the build.ninja parameters. ` ninja_required_version = 1.3 cxx = cl nvcc = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc
cflags = -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\TH -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IC:\ProgramData\Anaconda3\envs\py310torch\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=__nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS__ post_cflags = cuda_cflags = -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\TH -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IC:\ProgramData\Anaconda3\envs\py310torch\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=__nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS cuda_post_cflags = cuda_dlink_post_cflags = ldflags = /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\ProgramData\Anaconda3\envs\py310torch\libs "/LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/lib" cublas.lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64" cudart.lib
rule compile command = cl /showIncludes $cflags -c $in /Fo$out $post_cflags deps = msvc
rule cuda_compile depfile = $out.d deps = msvc command = $nvcc --generate-dependencies-with-compile --dependency-output $out.d $cuda_cflags -c $in -o $out $cuda_post_cflags
rule link command = "link.exe" $in /nologo $ldflags /out:$out
build slstm.o: compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm.cc build slstm_forward.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_forward.cu build slstm_backward.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward.cu build slstm_backward_cut.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu build slstm_pointwise.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu build blas.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\util\blas.cu build cuda_error.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\util\cuda_error.cu
build slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd: link slstm.o slstm_forward.cuda.o slstm_backward.cuda.o slstm_backward_cut.cuda.o slstm_pointwise.cuda.o blas.cuda.o cuda_error.cuda.o
default slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd
`
Have you successfully built on Windows 10你成功了吗
Have you successfully built on Windows 10你成功了吗
of course
I still got a problem even though I make every change like you posted. Do you mind taking a look?
'build.ninja' ninja_required_version = 1.3 cxx = cl nvcc = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc
cflags = -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -ID:\Anaconda\envs\xlstm\include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -UCUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATOR
S -UCUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS__ post_cflags = cuda_cflags = -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -ID:\Anaconda\envs\xlstm\include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS cuda_post_cflags = cuda_dlink_post_cflags = ldflags = /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@yahxz torch.lib /LIBPATH:D:\Anaconda\envs\xlstm\Lib\site-packages\torch\lib /LIBPATH:D:\Anaconda\envs\xlstm\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib" cublas.lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib\x64" cudart.lib
rule compile command = cl /showIncludes $cflags -c $in /Fo$out $post_cflags deps = msvc
rule cuda_compile depfile = $out.d deps = msvc command = $nvcc --generate-dependencies-with-compile --dependency-output $out.d $cuda_cflags -c $in -o $out $cuda_post_cflags
rule link command = "link.exe" $in /nologo $ldflags /out:$out
build slstm.o: compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm.cc build slstm_forward.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\site-packages\xlstm\blocks\slstm\src\cuda\slstm_forward.cu build slstm_backward.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward.cu build slstm_backward_cut.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu build slstm_pointwise.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu build blas.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\util\blas.cu build cuda_error.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\util\cuda_error.cu
build slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd: link slstm.o slstm_forward.cuda.o slstm_backward.cuda.o slstm_backward_cut.cuda.o slstm_pointwise.cuda.o blas.cuda.o cuda_error.cuda.o
default slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd
@vanclouds7 Please ensure that ninja, cuda and cudnn are installed, And add include files and dynamic libraries: INCLUDE
D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\shared C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\winrt LIB
D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\lib\x64 C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\x64 C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\ucrt\x64 C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Lib
You can use 'ninja', '-v' to get more verbose output
@vanclouds7 I forgot one
delete "extra_ldflags": [f"-L{os.environ['CUDA_LIB']}", "-lcublas"],
This is linux Syntax format.
@wrench1997 I believe I've done everything you asked, but there's still an error.
@vanclouds7 Go to the sltsm ninja.build directory to view the error output. In addition, I saw that you did not specify your cuda version in the python variable.
@vanclouds7
Edit the torch.utils.cpp_extension.py
, in function _write_ninja_file()
, add line in the end:
_maybe_write('build.ninja', content)
, like:
Now th code will generate build.ninja
to your workspace root. You can run command ninja -v
to see what's wrong with the code.
In my case, i solved the problem by editing the xlstm\blocks\slstm\src\cuda_init.py, load()
:
(torch dir)/include
was missing while including Aten/aTen.h
(torch dir)/include/torch/csrc/api/include
was missing while including torch/all.h
#TORCH_HOME = os.path.abspath(torch.__file__).replace('\__init__.py','')
# edit: add to `extra_cflags`
f"-I{TORCH_HOME}/include",
f"-I{TORCH_HOME}/include/torch/csrc/api/include",
(torch dir)/lib
was missing while searching for c10.lib
-Xptxas -3
will raise error, removed from extra_cuda_cflags
in myargs
As @wrench1997 said, f"-L{os.environ['CUDA_LIB']}", "-lcublas"
don't work on windows, replace it with
f"/LIBPATH:{CUDA_HOME}/lib/x64", "cublas.lib"
#CUDA_HOME=os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')
# edit: add to `extra_ldflags`
f"/LIBPATH:{TORCH_HOME}/lib",
f"/LIBPATH:{CUDA_HOME}/lib/x64","cublas.lib",
Lastly, run pip uninstall xlstm
to remove the xlstm package in your env, make sure the edited file wound work.
could you share your code and environment wityh me ? thank you a lot!
could you share your code and environment wityh me ? thank you a lot!
你好 同学 我能不能加一下你的微信 Liz18326042653 .非常感谢
@Adapter525 我的方案上传了,你可以试试 EN:I just uploaded my solution, hope it works for you
And add include files and dynamic libraries:
Could you please indicate where to include this files and libraries? OI feel like I am following all the instructions, but the build still fails
And add include files and dynamic libraries:
Could you please indicate where to include this files and libraries? OI feel like I am following all the instructions, but the build still fails
Hello, do you have any questions? Can you send me an error message?
Hello, do you have any questions? Can you send me an error message?
Hi, yes, I attached the error log to the message and also build.ninja file and cuda_init.py file. The .ninja_log is not very informative.
@kristinaste
I just tried xlstm1.0.5 on windows11, the compatibility was improved. Just two steps make it work:
Step 1: disable the line "-Xptxas -O3"
in "extra_cuda_cflags"
;
Step 2: replace the content of "extra_ldflags"
with f"/LIBPATH:{CUDA_HOME}/lib/x64","cublas.lib"
, CUDA_HOME
got by code CUDA_HOME=os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')
.
Good luck!
@kristinaste
I just tried xlstm1.0.5 on windows11, the compatibility was improved. Just two steps make it work:
Step 1: disable the line "-Xptxas -O3"
in "extra_cuda_cflags"
;
Step 2: replace the content of "extra_ldflags"
with f"/LIBPATH:{CUDA_HOME}/lib/x64","cublas.lib"
, CUDA_HOME
got by code CUDA_HOME=os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')
.
Good luck!
Hello, do you have any questions? Can you send me an error message?
Hi, yes, I attached the error log to the message and also build.ninja file and cuda_init.py file. The .ninja_log is not very informative.
Here are my relevant changes, Make sure both link.exe and cl.exe can run directly. build_copy_ninjia.txt cpp_extension.txt line 1876
I have been trying for a few days, replacing cuda12.1, cudnn, and building ninja from scratch. Win10 still reports an error. The compatibility with win is too poor