NX-AI / xlstm

Official repository of the xLSTM.
GNU Affero General Public License v3.0
1.23k stars 88 forks source link

Has anyone successfully built on Windows 10? #44

Open wrench1997 opened 1 month ago

wrench1997 commented 1 month ago

I have been trying for a few days, replacing cuda12.1, cudnn, and building ninja from scratch. Win10 still reports an error. The compatibility with win is too poor

wrench1997 commented 1 month ago

I seem to have succeeded, but there are many problems. The core lies in the build.ninja parameters. ` ninja_required_version = 1.3 cxx = cl nvcc = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc

cflags = -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\TH -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IC:\ProgramData\Anaconda3\envs\py310torch\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=__nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS__ post_cflags = cuda_cflags = -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\TH -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IC:\ProgramData\Anaconda3\envs\py310torch\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=__nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS cuda_post_cflags = cuda_dlink_post_cflags = ldflags = /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\ProgramData\Anaconda3\envs\py310torch\libs "/LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/lib" cublas.lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64" cudart.lib

rule compile command = cl /showIncludes $cflags -c $in /Fo$out $post_cflags deps = msvc

rule cuda_compile depfile = $out.d deps = msvc command = $nvcc --generate-dependencies-with-compile --dependency-output $out.d $cuda_cflags -c $in -o $out $cuda_post_cflags

rule link command = "link.exe" $in /nologo $ldflags /out:$out

build slstm.o: compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm.cc build slstm_forward.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_forward.cu build slstm_backward.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward.cu build slstm_backward_cut.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu build slstm_pointwise.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu build blas.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\util\blas.cu build cuda_error.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\util\cuda_error.cu

build slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd: link slstm.o slstm_forward.cuda.o slstm_backward.cuda.o slstm_backward_cut.cuda.o slstm_pointwise.cuda.o blas.cuda.o cuda_error.cuda.o

default slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd

`

Adapter525 commented 1 month ago

Have you successfully built on Windows 10你成功了吗

wrench1997 commented 1 month ago

Have you successfully built on Windows 10你成功了吗

of course 6DFB7DD3A813A0675FB13C845295ECAE 80C1A757BEB335E948F411F3FF93FD06 AF07C191C0C8CBB038ECAAB24A341B62

4C1699BFA24B9D256992C5932A386BBD

vanclouds7 commented 1 month ago

I still got a problem even though I make every change like you posted. Do you mind taking a look?

'build.ninja' ninja_required_version = 1.3 cxx = cl nvcc = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc

cflags = -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -ID:\Anaconda\envs\xlstm\include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -UCUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATOR

S -UCUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS__ post_cflags = cuda_cflags = -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -ID:\Anaconda\envs\xlstm\include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS cuda_post_cflags = cuda_dlink_post_cflags = ldflags = /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@yahxz torch.lib /LIBPATH:D:\Anaconda\envs\xlstm\Lib\site-packages\torch\lib /LIBPATH:D:\Anaconda\envs\xlstm\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib" cublas.lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib\x64" cudart.lib

rule compile command = cl /showIncludes $cflags -c $in /Fo$out $post_cflags deps = msvc

rule cuda_compile depfile = $out.d deps = msvc command = $nvcc --generate-dependencies-with-compile --dependency-output $out.d $cuda_cflags -c $in -o $out $cuda_post_cflags

rule link command = "link.exe" $in /nologo $ldflags /out:$out

build slstm.o: compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm.cc build slstm_forward.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\site-packages\xlstm\blocks\slstm\src\cuda\slstm_forward.cu build slstm_backward.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward.cu build slstm_backward_cut.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu build slstm_pointwise.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu build blas.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\util\blas.cu build cuda_error.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\util\cuda_error.cu

build slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd: link slstm.o slstm_forward.cuda.o slstm_backward.cuda.o slstm_backward_cut.cuda.o slstm_pointwise.cuda.o blas.cuda.o cuda_error.cuda.o

default slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd

微信截图_20240804020012 微信截图_20240804020137 微信截图_20240804020332

wrench1997 commented 1 month ago

@vanclouds7 Please ensure that ninja, cuda and cudnn are installed, And add include files and dynamic libraries: INCLUDE

D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\shared C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\winrt LIB

D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\lib\x64 C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\x64 C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\ucrt\x64 C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Lib

You can use 'ninja', '-v' to get more verbose output

wrench1997 commented 1 month ago

@vanclouds7 I forgot one QQ20240804-022851 delete "extra_ldflags": [f"-L{os.environ['CUDA_LIB']}", "-lcublas"], This is linux Syntax format.

vanclouds7 commented 1 month ago

@wrench1997 I believe I've done everything you asked, but there's still an error. 微信截图_20240804223839 微信截图_20240804225925 微信截图_20240804225936

wrench1997 commented 1 month ago

@vanclouds7 Go to the sltsm ninja.build directory to view the error output. In addition, I saw that you did not specify your cuda version in the python variable.

gutaihai commented 1 month ago

@vanclouds7 Edit the torch.utils.cpp_extension.py, in function _write_ninja_file(), add line in the end: _maybe_write('build.ninja', content), like:
image
Now th code will generate build.ninja to your workspace root. You can run command ninja -v to see what's wrong with the code. In my case, i solved the problem by editing the xlstm\blocks\slstm\src\cuda_init.py, load(): (torch dir)/include was missing while including Aten/aTen.h (torch dir)/include/torch/csrc/api/include was missing while including torch/all.h

        #TORCH_HOME = os.path.abspath(torch.__file__).replace('\__init__.py','')
        # edit: add to `extra_cflags`
        f"-I{TORCH_HOME}/include",
        f"-I{TORCH_HOME}/include/torch/csrc/api/include",

(torch dir)/lib was missing while searching for c10.lib -Xptxas -3 will raise error, removed from extra_cuda_cflags in myargs As @wrench1997 said, f"-L{os.environ['CUDA_LIB']}", "-lcublas" don't work on windows, replace it with f"/LIBPATH:{CUDA_HOME}/lib/x64", "cublas.lib"

        #CUDA_HOME=os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')
        # edit: add to `extra_ldflags`
        f"/LIBPATH:{TORCH_HOME}/lib",
        f"/LIBPATH:{CUDA_HOME}/lib/x64","cublas.lib",

Lastly, run pip uninstall xlstm to remove the xlstm package in your env, make sure the edited file wound work.

Adapter525 commented 1 month ago

could you share your code and environment wityh me ? thank you a lot!

Adapter525 commented 1 month ago

could you share your code and environment wityh me ? thank you a lot!

Adapter525 commented 1 month ago

你好 同学 我能不能加一下你的微信 Liz18326042653 .非常感谢

gutaihai commented 1 month ago

@Adapter525 我的方案上传了,你可以试试 EN:I just uploaded my solution, hope it works for you

kristinaste commented 2 weeks ago

And add include files and dynamic libraries:

Could you please indicate where to include this files and libraries? OI feel like I am following all the instructions, but the build still fails

wrench1997 commented 1 week ago

And add include files and dynamic libraries:

Could you please indicate where to include this files and libraries? OI feel like I am following all the instructions, but the build still fails

Hello, do you have any questions? Can you send me an error message?

kristinaste commented 1 week ago

error_log.txt imagen imagen

Hello, do you have any questions? Can you send me an error message?

Hi, yes, I attached the error log to the message and also build.ninja file and cuda_init.py file. The .ninja_log is not very informative.

gutaihai commented 1 week ago

@kristinaste I just tried xlstm1.0.5 on windows11, the compatibility was improved. Just two steps make it work: Step 1: disable the line "-Xptxas -O3" in "extra_cuda_cflags"; Step 2: replace the content of "extra_ldflags" with f"/LIBPATH:{CUDA_HOME}/lib/x64","cublas.lib", CUDA_HOME got by code CUDA_HOME=os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH'). pic Good luck!

gutaihai commented 1 week ago

@kristinaste I just tried xlstm1.0.5 on windows11, the compatibility was improved. Just two steps make it work: Step 1: disable the line "-Xptxas -O3" in "extra_cuda_cflags"; Step 2: replace the content of "extra_ldflags" with f"/LIBPATH:{CUDA_HOME}/lib/x64","cublas.lib", CUDA_HOME got by code CUDA_HOME=os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH'). pic Good luck!

wrench1997 commented 1 week ago

error_log.txt imagen imagen

Hello, do you have any questions? Can you send me an error message?

Hi, yes, I attached the error log to the message and also build.ninja file and cuda_init.py file. The .ninja_log is not very informative.

Here are my relevant changes, Make sure both link.exe and cl.exe can run directly. build_copy_ninjia.txt cpp_extension.txt line 1876