Bug: Compilation failure with CUDA support on Windows: nvcc input file error

What happened?

Description: I am attempting to compile the llama.cpp project with CUDA support enabled (GGML_CUDA=1) on a Windows system using MinGW. I have set the CUDA_DOCKER_ARCH environment variable as per the requirements, but I am encountering a compilation error related to nvcc.

Steps to Reproduce:

Set the CUDA_DOCKER_ARCH environment variable: export CUDA_DOCKER_ARCH=compute_89
Run the make command: make GGML_CUDA=1

Expected Behavior: The project should compile successfully with CUDA support enabled.

Details about my terminal environment: W:\Code\CCpp\FunnyProject\llama.cpp ❯ w64devkit ~ $ printenv SCOOP=J:/Software/Scoop CONDA_PROMPT_MODIFIER=False HTTPS_PROXY=http://127.0.0.1:7890 PROGRAMFILESX86=C:/Program Files (x86) USER=WangHao LOGONSERVER=//WANGHAOITX PROGRAMFILES=C:/Program Files ALLUSERSPROFILE=C:/ProgramData POSH_THEMES_PATH=C:/Users/WangHao/AppData/Local/Programs/oh-my-posh/themes PROGRAMW6432=C:/Program Files POWERLINE_COMMAND=oh-my-posh WT_PROFILE_ID={574e775e-4f2a-5b96-ac1e-a2962a402336} SHLVL=1 HOME=C:/Users/WangHao CONDA_SHLVL=1 POSH_GIT_ENABLED=False SYSTEMDRIVE=C: ProgramFiles(x86)=C:\Program Files (x86) POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 97 Stepping 2, AuthenticAMD SSL_CERT_FILE=J:/Software/Scoop/apps/miniconda/current/envs/tensor/Library/ssl/cacert.pem PROCESSOR_REVISION=6102 PUBLIC=C:/Users/Public W64DEVKIT=1.23.0 _CONDA_ROOT=J:/Software/Scoop/apps/miniconda/current USERDOMAIN=WANGHAOITX POSH_AZURE_ENABLED=False PROCESSOR_ARCHITECTURE=AMD64 PSMODULEPATH=C:/Users/WangHao/Documents/PowerShell/Modules;C:/Program Files/PowerShell/Modules;c:/program files/powershell/7/Modules;C:/Program Files/WindowsPowerShell/Modules;C:/Windows/system32/WindowsPowerShell/v1.0/Modules W64DEVKIT_HOME=J:/Software/MinGW LOGNAME=WangHao COMMONPROGRAMFILESX86=C:/Program Files (x86)/Common Files TEMP=C:/Users/WangHao/AppData/Local/Temp COMMONPROGRAMFILES=C:/Program Files/Common Files USERNAME=WangHao COMMONPROGRAMW6432=C:/Program Files/Common Files LOCALAPPDATA=C:/Users/WangHao/AppData/Local POSH_SHELL_VERSION=7.4.2 SESSIONNAME=Console WINDIR=C:/Windows PATH=J:/Software/MinGW/bin;J:/Software/Scoop/apps/openjdk22/current/bin;J:/Software/Scoop/apps/miniconda/current/envs/tensor;J:/Software/Scoop/apps/miniconda/current/envs/tensor/Library/mingw-w64/bin;J:/Software/Scoop/apps/miniconda/current/envs/tensor/Library/usr/bin;J:/Software/Scoop/apps/miniconda/current/envs/tensor/Library/bin;J:/Software/Scoop/apps/miniconda/current/envs/tensor/Scripts;J:/Software/Scoop/apps/miniconda/current/envs/tensor/bin;J:/Software/Scoop/apps/miniconda/current/condabin;C:/Program Files/PowerShell/7;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/bin;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/libnvvp;C:/Windows/system32;C:/Windows;C:/Windows/System32/Wbem;C:/Windows/System32/WindowsPowerShell/v1.0;C:/Windows/System32/OpenSSH;C:/Program Files (x86)/NVIDIA Corporation/PhysX/Common;C:/Program Files/NVIDIA Corporation/Nsight Compute 2023.3.0;C:/Program Files/dotnet;C:/Program Files/PowerShell/7;J:/Software/Scoop/shims;C:/Users/WangHao/AppData/Local/Microsoft/WindowsApps;J:/Software/VSCode/bin;C:/Users/WangHao/AppData/Local/Programs/oh-my-posh/bin;C:/Users/WangHao/.dotnet/tools;J:/Software/LLVM;J:/Software/LLVM/bin;J:/Software/MinGW;J:/Software/MinGW/bin;J:/Software/RuntimeLibrary/libtorch/2.4.0/build;J:/Software/RuntimeLibrary/openblas/0.3.27/build;J:/Software/RuntimeLibrary/opencv/4.8.0/build SCOOP_GLOBAL=J:/Software/Scoop OS=Windows_NT WT_SESSION=68881e97-3db7-4fea-a4d4-75c276bd15ec NUMBER_OF_PROCESSORS=16 POSH_CURSOR_LINE=4 USERPROFILE=C:/Users/WangHao TMP=C:/Users/WangHao/AppData/Local/Temp APPDATA=C:/Users/WangHao/AppData/Roaming CONDA_PYTHON_EXE=J:/Software/Scoop/apps/miniconda/current/python.exe SHELL=/bin/sh PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files CONDA_DEFAULT_ENV=tensor BB_GLOBBING=0 ONEDRIVECONSUMER=C:/Users/WangHao/OneDrive PROGRAMDATA=C:/ProgramData SYSTEMROOT=C:\Windows USERDOMAIN_ROAMINGPROFILE=WANGHAOITX _CONDA_EXE=J:/Software/Scoop/apps/miniconda/current/Scripts/conda.exe __CONDA_OPENSLL_CERT_FILE_SET=1 NVTOOLSEXT_PATH=C:/Program Files/NVIDIA Corporation/NvToolsExt/ POSH_THEME=C:/Users/WangHao/AppData/Local/Programs/oh-my-posh/themes/peru.omp.json HOMEDRIVE=C: JAVA_HOME=J:/Software/Scoop/apps/openjdk22/current POSH_CURSOR_COLUMN=1 PWD=C:/Users/WangHao COMPUTERNAME=WANGHAOITX COMSPEC=C:\Windows\system32\cmd.exe CONDA_EXE=J:/Software/Scoop/apps/miniconda/current/Scripts/conda.exe CUDA_PATH_V12_3=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3 GIT_INSTALL_ROOT=J:/Software/Scoop/apps/git/current HOMEPATH=/Users/WangHao HTTP_PROXY=http://127.0.0.1:7890 ONEDRIVE=C:/Users/WangHao/OneDrive POSH_INSTALLER=winget CONDA_PREFIX=J:/Software/Scoop/apps/miniconda/current/envs/tensor DRIVERDATA=C:/Windows/System32/Drivers/DriverData CUDA_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3 PROCESSOR_LEVEL=25 POSH_PID=144612 WSLENV=WT_SESSION:WT_PROFILE_ID:

Name and Version

llama.cpp e54c35e4fb5777c76316a50671640e6e144c9538 Operating System: Windows 10 Compiler: GCC 14.1.0 CUDA Version: 12.3 GPU: NVIDIA GeForce RTX 4080

What operating system are you seeing the problem on?

Windows

Relevant log output

The compilation process fails with the following error message:
W:/Code/CCpp/FunnyProject/llama.cpp $ make GGML_CUDA=1
I ccache not found. Consider installing it for faster compilation.
I llama.cpp build info:
I UNAME_S:   Windows_NT
I UNAME_P:   unknown
I UNAME_M:   x86_64
I CFLAGS:    -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/targets/x86_64-linux/include -DGGML_CUDA_USE_GRAPHS  -std=c11   -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion
I CXXFLAGS:  -std=c++11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/targets/x86_64-linux/include -DGGML_CUDA_USE_GRAPHS
I NVCCFLAGS: -std=c++11 -O3 -g -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=compute_89 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS:   -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/lib64 -L/usr/lib64 -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/targets/x86_64-linux/lib -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/lib64/stubs -L/usr/lib/wsl/lib
I CC:        cc (GCC) 14.1.0
I CXX:       c++ (GCC) 14.1.0
I NVCC:      Build cuda_12.3.r12.3/compiler.33281558_0
grep: unknown option -- P
BusyBox v1.37.0.git-5301-gda71f7c57 (2024-05-08 15:37:43 UTC)

Usage: grep [-HhnlLoqvsrRiwFE] [-m N] [-A|B|C N] { PATTERN | -e PATTERN... | -f FILE... } [FILE]...

Search for PATTERN in FILEs (or stdin)

        -H      Add 'filename:' prefix
        -h      Do not add 'filename:' prefix
        -n      Add 'line_no:' prefix
        -l      Show only names of files that match
        -L      Show only names of files that don't match
        -c      Show only count of matching lines
        -o      Show only the matching part of line
        -q      Quiet. Return 0 if PATTERN is found, 1 otherwise
        -v      Select non-matching lines
        -s      Suppress open and read errors
        -r      Recurse
        -R      Recurse and dereference symlinks
        -i      Ignore case
        -w      Match whole words only
        -x      Match whole lines only
        -F      PATTERN is a literal (not regexp)
        -E      PATTERN is an extended regexp
        -m N    Match up to N times per file
        -A N    Print N lines of trailing context
        -B N    Print N lines of leading context
        -C N    Same as '-A N -B N'
        -e PTRN Pattern to match
        -f FILE Read pattern from file

nvcc -std=c++11 -O3 -g -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=compute_89 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128  -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/targets/x86_64-linux/include -DGGML_CUDA_USE_GRAPHS  -Xcompiler "-std=c++11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -Wno-array-bounds -Wno-pedantic" -c ggml/src/ggml-cuda.cu -o ggml/src/ggml-cuda.o
nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
make: *** [Makefile:745: ggml/src/ggml-cuda.o] Error 1

I'm also encountering this issue, checked the makefile syntax, but found no problems, which is strange.

same problem here, any progress? I tried to modify Makefile to run the error caused 'nvcc ***' command without '-o ggml/src/ggml-cuda.o', this error is gone but it will further encounter other problems, like there be spaces in include dir specified by -I, which nvcc cannot handle.

it would seems to me that building llama.cpp on Windows with CUDA would need a difference environment with pure CPU

As far as I know, CUDA under windows is only supported with MSVC.

As far as I know, CUDA under windows is only supported with MSVC.

yes, I assume that means I could only compile CUDA within Visual Studio IDE? I did successfully build llama.cpp CUDA using VS.

This issue was closed because it has been inactive for 14 days since being marked as stale.

What happened?

Description: I am attempting to compile the llama.cpp project with CUDA support enabled (GGML_CUDA=1) on a Windows system using MinGW. I have set the CUDA_DOCKER_ARCH environment variable as per the requirements, but I am encountering a compilation error related to nvcc.

Steps to Reproduce:

Set the CUDA_DOCKER_ARCH environment variable: export CUDA_DOCKER_ARCH=compute_89

Run the make command: make GGML_CUDA=1

Expected Behavior: The project should compile successfully with CUDA support enabled.

Details about my terminal environment: W:\Code\CCpp\FunnyProject\llama.cpp ❯ w64devkit ~ $ printenv SCOOP=J:/Software/Scoop CONDA_PROMPT_MODIFIER=False HTTPS_PROXY=http://127.0.0.1:7890 PROGRAMFILESX86=C:/Program Files (x86) USER=WangHao LOGONSERVER=//WANGHAOITX PROGRAMFILES=C:/Program Files ALLUSERSPROFILE=C:/ProgramData POSH_THEMES_PATH=C:/Users/WangHao/AppData/Local/Programs/oh-my-posh/themes PROGRAMW6432=C:/Program Files POWERLINE_COMMAND=oh-my-posh WT_PROFILE_ID={574e775e-4f2a-5b96-ac1e-a2962a402336} SHLVL=1 HOME=C:/Users/WangHao CONDA_SHLVL=1 POSH_GIT_ENABLED=False SYSTEMDRIVE=C: ProgramFiles(x86)=C:\Program Files (x86) POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 97 Stepping 2, AuthenticAMD SSL_CERT_FILE=J:/Software/Scoop/apps/miniconda/current/envs/tensor/Library/ssl/cacert.pem PROCESSOR_REVISION=6102 PUBLIC=C:/Users/Public W64DEVKIT=1.23.0 _CONDA_ROOT=J:/Software/Scoop/apps/miniconda/current USERDOMAIN=WANGHAOITX POSH_AZURE_ENABLED=False PROCESSOR_ARCHITECTURE=AMD64 PSMODULEPATH=C:/Users/WangHao/Documents/PowerShell/Modules;C:/Program Files/PowerShell/Modules;c:/program files/powershell/7/Modules;C:/Program Files/WindowsPowerShell/Modules;C:/Windows/system32/WindowsPowerShell/v1.0/Modules W64DEVKIT_HOME=J:/Software/MinGW LOGNAME=WangHao COMMONPROGRAMFILESX86=C:/Program Files (x86)/Common Files TEMP=C:/Users/WangHao/AppData/Local/Temp COMMONPROGRAMFILES=C:/Program Files/Common Files USERNAME=WangHao COMMONPROGRAMW6432=C:/Program Files/Common Files LOCALAPPDATA=C:/Users/WangHao/AppData/Local POSH_SHELL_VERSION=7.4.2 SESSIONNAME=Console WINDIR=C:/Windows PATH=J:/Software/MinGW/bin;J:/Software/Scoop/apps/openjdk22/current/bin;J:/Software/Scoop/apps/miniconda/current/envs/tensor;J:/Software/Scoop/apps/miniconda/current/envs/tensor/Library/mingw-w64/bin;J:/Software/Scoop/apps/miniconda/current/envs/tensor/Library/usr/bin;J:/Software/Scoop/apps/miniconda/current/envs/tensor/Library/bin;J:/Software/Scoop/apps/miniconda/current/envs/tensor/Scripts;J:/Software/Scoop/apps/miniconda/current/envs/tensor/bin;J:/Software/Scoop/apps/miniconda/current/condabin;C:/Program Files/PowerShell/7;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/bin;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/libnvvp;C:/Windows/system32;C:/Windows;C:/Windows/System32/Wbem;C:/Windows/System32/WindowsPowerShell/v1.0;C:/Windows/System32/OpenSSH;C:/Program Files (x86)/NVIDIA Corporation/PhysX/Common;C:/Program Files/NVIDIA Corporation/Nsight Compute 2023.3.0;C:/Program Files/dotnet;C:/Program Files/PowerShell/7;J:/Software/Scoop/shims;C:/Users/WangHao/AppData/Local/Microsoft/WindowsApps;J:/Software/VSCode/bin;C:/Users/WangHao/AppData/Local/Programs/oh-my-posh/bin;C:/Users/WangHao/.dotnet/tools;J:/Software/LLVM;J:/Software/LLVM/bin;J:/Software/MinGW;J:/Software/MinGW/bin;J:/Software/RuntimeLibrary/libtorch/2.4.0/build;J:/Software/RuntimeLibrary/openblas/0.3.27/build;J:/Software/RuntimeLibrary/opencv/4.8.0/build SCOOP_GLOBAL=J:/Software/Scoop OS=Windows_NT WT_SESSION=68881e97-3db7-4fea-a4d4-75c276bd15ec NUMBER_OF_PROCESSORS=16 POSH_CURSOR_LINE=4 USERPROFILE=C:/Users/WangHao TMP=C:/Users/WangHao/AppData/Local/Temp APPDATA=C:/Users/WangHao/AppData/Roaming CONDA_PYTHON_EXE=J:/Software/Scoop/apps/miniconda/current/python.exe SHELL=/bin/sh PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files CONDA_DEFAULT_ENV=tensor BB_GLOBBING=0 ONEDRIVECONSUMER=C:/Users/WangHao/OneDrive PROGRAMDATA=C:/ProgramData SYSTEMROOT=C:\Windows USERDOMAIN_ROAMINGPROFILE=WANGHAOITX _CONDA_EXE=J:/Software/Scoop/apps/miniconda/current/Scripts/conda.exe __CONDA_OPENSLL_CERT_FILE_SET=1 NVTOOLSEXT_PATH=C:/Program Files/NVIDIA Corporation/NvToolsExt/ POSH_THEME=C:/Users/WangHao/AppData/Local/Programs/oh-my-posh/themes/peru.omp.json HOMEDRIVE=C: JAVA_HOME=J:/Software/Scoop/apps/openjdk22/current POSH_CURSOR_COLUMN=1 PWD=C:/Users/WangHao COMPUTERNAME=WANGHAOITX COMSPEC=C:\Windows\system32\cmd.exe CONDA_EXE=J:/Software/Scoop/apps/miniconda/current/Scripts/conda.exe CUDA_PATH_V12_3=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3 GIT_INSTALL_ROOT=J:/Software/Scoop/apps/git/current HOMEPATH=/Users/WangHao HTTP_PROXY=http://127.0.0.1:7890 ONEDRIVE=C:/Users/WangHao/OneDrive POSH_INSTALLER=winget CONDA_PREFIX=J:/Software/Scoop/apps/miniconda/current/envs/tensor DRIVERDATA=C:/Windows/System32/Drivers/DriverData CUDA_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3 PROCESSOR_LEVEL=25 POSH_PID=144612 WSLENV=WT_SESSION:WT_PROFILE_ID:

Name and Version

llama.cpp e54c35e Operating System: Windows 10 Compiler: GCC 14.1.0 CUDA Version: 12.3 GPU: NVIDIA GeForce RTX 4080

What operating system are you seeing the problem on?

Windows

Relevant log output
The compilation process fails with the following error message:
W:/Code/CCpp/FunnyProject/llama.cpp $ make GGML_CUDA=1
I ccache not found. Consider installing it for faster compilation.
I llama.cpp build info:
I UNAME_S:   Windows_NT
I UNAME_P:   unknown
I UNAME_M:   x86_64
I CFLAGS:    -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/targets/x86_64-linux/include -DGGML_CUDA_USE_GRAPHS  -std=c11   -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion
I CXXFLAGS:  -std=c++11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/targets/x86_64-linux/include -DGGML_CUDA_USE_GRAPHS
I NVCCFLAGS: -std=c++11 -O3 -g -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=compute_89 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS:   -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/lib64 -L/usr/lib64 -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/targets/x86_64-linux/lib -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/lib64/stubs -L/usr/lib/wsl/lib
I CC:        cc (GCC) 14.1.0
I CXX:       c++ (GCC) 14.1.0
I NVCC:      Build cuda_12.3.r12.3/compiler.33281558_0
grep: unknown option -- P
BusyBox v1.37.0.git-5301-gda71f7c57 (2024-05-08 15:37:43 UTC)

Usage: grep [-HhnlLoqvsrRiwFE] [-m N] [-A|B|C N] { PATTERN | -e PATTERN... | -f FILE... } [FILE]...

Search for PATTERN in FILEs (or stdin)

        -H      Add 'filename:' prefix
        -h      Do not add 'filename:' prefix
        -n      Add 'line_no:' prefix
        -l      Show only names of files that match
        -L      Show only names of files that don't match
        -c      Show only count of matching lines
        -o      Show only the matching part of line
        -q      Quiet. Return 0 if PATTERN is found, 1 otherwise
        -v      Select non-matching lines
        -s      Suppress open and read errors
        -r      Recurse
        -R      Recurse and dereference symlinks
        -i      Ignore case
        -w      Match whole words only
        -x      Match whole lines only
        -F      PATTERN is a literal (not regexp)
        -E      PATTERN is an extended regexp
        -m N    Match up to N times per file
        -A N    Print N lines of trailing context
        -B N    Print N lines of leading context
        -C N    Same as '-A N -B N'
        -e PTRN Pattern to match
        -f FILE Read pattern from file

nvcc -std=c++11 -O3 -g -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=compute_89 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128  -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/targets/x86_64-linux/include -DGGML_CUDA_USE_GRAPHS  -Xcompiler "-std=c++11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -Wno-array-bounds -Wno-pedantic" -c ggml/src/ggml-cuda.cu -o ggml/src/ggml-cuda.o
nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
make: *** [Makefile:745: ggml/src/ggml-cuda.o] Error 1

Any improvements?

I later gave up compiling on Windows and chose to use WSL (Windows Subsystem for Linux) for compilation, as I didn’t have enough time to dive deeply into the issues, and there weren’t enough reference materials for improvements.

ggerganov / llama.cpp