ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.49k stars 9.69k forks source link

Make llama_cublas=1 fails on aws #2633

Closed Gideonah closed 1 year ago

Gideonah commented 1 year ago

Prerequisites

Hi when running make LLama_cublas=1,

Expected Behavior

When running make on an aws instance, i can build the llama.cpp package, however when trying to add gpu support and build with make llama_cublas=1, i get the following error.

Current Behavior

The files ggml-cuda.cu error on the lines output below.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

sh-4.2$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Stepping: 7 CPU MHz: 3140.118 BogoMIPS: 4999.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni

$ uname -a

Python 3.10.8
GNU Make 3.82
g++ (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)

Failure Information (for bugs)

i tried with Cuda versions 11.2 - 11.8

Steps to Reproduce

Step 1. To reproduce boot up a ml.g4dn.2xlarge. Step 2. clone the LLama cpp package Step 3. run make llama_cublas=1

Failure Logs

Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.

Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability.

Example environment info:
sh-4.2$ make LLAMA_CUBLAS=1
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS:   -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC:       cc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)
I CXX:      g++ (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)

nvcc --forward-unknown-to-host-compiler -use_fast_math -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o
ggml-cuda.cu(5691): error: too few arguments in function call

ggml-cuda.cu(5888): error: too few arguments in function call
bigmover commented 1 year ago

try cmake. maybe success

Gideonah commented 1 year ago

When trying with cmake everything appears to run fine but i still can't run the original make file. So the whole package still doesn't build. Or is there something else i'm meant to run after?

mkdir build cd build cmake .. -DLLAMA_CUBLAS=ON cmake --build . --config Release

sh-4.2$ cmake .. -DLLAMA_CUBLAS=ON -- The C compiler identification is GNU 7.3.1 -- The CXX compiler identification is GNU 7.3.1 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /usr/bin/git (found version "2.40.1") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Check if compiler accepts -pthread -- Check if compiler accepts -pthread - yes -- Found Threads: TRUE
-- Found CUDAToolkit: /usr/local/cuda-11.8/include (found version "11.8.89") -- cuBLAS found -- The CUDA compiler identification is NVIDIA 11.8.89 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda-11.8/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Using CUDA architectures: 52;61;70 -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done (1.9s) -- Generating done (0.1s) -- Build files have been written to: /home/ec2-user/SageMaker/llama.cpp/llama.cpp/build sh-4.2$ cmake --build . --config Release [ 1%] Built target BUILD_INFO [ 3%] Building C object CMakeFiles/ggml.dir/ggml.c.o [ 5%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o [ 7%] Building CUDA object CMakeFiles/ggml.dir/ggml-cuda.cu.o [ 8%] Building C object CMakeFiles/ggml.dir/k_quants.c.o [ 8%] Built target ggml [ 10%] Linking CUDA static library libggml_static.a [ 10%] Built target ggml_static [ 12%] Building CXX object CMakeFiles/llama.dir/llama.cpp.o [ 14%] Linking CXX static library libllama.a [ 14%] Built target llama [ 15%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/test-quantize-fns.cpp.o [ 17%] Linking CXX executable ../bin/test-quantize-fns [ 17%] Built target test-quantize-fns [ 19%] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/test-quantize-perf.cpp.o [ 21%] Linking CXX executable ../bin/test-quantize-perf [ 21%] Built target test-quantize-perf [ 22%] Building CXX object tests/CMakeFiles/test-sampling.dir/test-sampling.cpp.o [ 24%] Linking CXX executable ../bin/test-sampling [ 24%] Built target test-sampling [ 26%] Building CXX object tests/CMakeFiles/test-tokenizer-0.dir/test-tokenizer-0.cpp.o /home/ec2-user/SageMaker/llama.cpp/llama.cpp/tests/test-tokenizer-0.cpp:19:2: warning: extra ‘;’ [-Wpedantic] }; ^ [ 28%] Linking CXX executable ../bin/test-tokenizer-0 [ 28%] Built target test-tokenizer-0 [ 29%] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/test-grammar-parser.cpp.o [ 31%] Linking CXX executable ../bin/test-grammar-parser [ 31%] Built target test-grammar-parser [ 33%] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/test-llama-grammar.cpp.o [ 35%] Linking CXX executable ../bin/test-llama-grammar [ 35%] Built target test-llama-grammar [ 36%] Building CXX object tests/CMakeFiles/test-grad0.dir/test-grad0.cpp.o [ 38%] Linking CXX executable ../bin/test-grad0 [ 38%] Built target test-grad0 [ 40%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o [ 42%] Building CXX object examples/CMakeFiles/common.dir/console.cpp.o [ 43%] Building CXX object examples/CMakeFiles/common.dir/grammar-parser.cpp.o [ 43%] Built target common [ 45%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o [ 47%] Linking CXX executable ../../bin/main [ 47%] Built target main [ 49%] Building CXX object examples/quantize/CMakeFiles/quantize.dir/quantize.cpp.o [ 50%] Linking CXX executable ../../bin/quantize [ 50%] Built target quantize [ 52%] Building CXX object examples/quantize-stats/CMakeFiles/quantize-stats.dir/quantize-stats.cpp.o [ 54%] Linking CXX executable ../../bin/quantize-stats [ 54%] Built target quantize-stats [ 56%] Building CXX object examples/perplexity/CMakeFiles/perplexity.dir/perplexity.cpp.o [ 57%] Linking CXX executable ../../bin/perplexity [ 57%] Built target perplexity [ 59%] Building CXX object examples/embedding/CMakeFiles/embedding.dir/embedding.cpp.o [ 61%] Linking CXX executable ../../bin/embedding [ 61%] Built target embedding [ 63%] Building CXX object examples/save-load-state/CMakeFiles/save-load-state.dir/save-load-state.cpp.o [ 64%] Linking CXX executable ../../bin/save-load-state [ 64%] Built target save-load-state [ 66%] Building CXX object examples/benchmark/CMakeFiles/benchmark.dir/benchmark-matmult.cpp.o [ 68%] Linking CXX executable ../../bin/benchmark [ 68%] Built target benchmark [ 70%] Building CXX object examples/baby-llama/CMakeFiles/baby-llama.dir/baby-llama.cpp.o /home/ec2-user/SageMaker/llama.cpp/llama.cpp/examples/baby-llama/baby-llama.cpp: In function ‘int main(int, char**)’: /home/ec2-user/SageMaker/llama.cpp/llama.cpp/examples/baby-llama/baby-llama.cpp:1620:32: warning: variable ‘opt_params_adam’ set but not used [-Wunused-but-set-variable] struct ggml_opt_params opt_params_adam = ggml_opt_default_params(GGML_OPT_ADAM); ^~~~~~~ [ 71%] Linking CXX executable ../../bin/baby-llama [ 71%] Built target baby-llama [ 73%] Building CXX object examples/train-text-from-scratch/CMakeFiles/train-text-from-scratch.dir/train-text-from-scratch.cpp.o [ 75%] Linking CXX executable ../../bin/train-text-from-scratch [ 75%] Built target train-text-from-scratch [ 77%] Building CXX object examples/convert-llama2c-to-ggml/CMakeFiles/convert-llama2c-to-ggml.dir/convert-llama2c-to-ggml.cpp.o [ 78%] Linking CXX executable ../../bin/convert-llama2c-to-ggml [ 78%] Built target convert-llama2c-to-ggml [ 80%] Building CXX object examples/simple/CMakeFiles/simple.dir/simple.cpp.o [ 82%] Linking CXX executable ../../bin/simple [ 82%] Built target simple [ 84%] Building CXX object examples/embd-input/CMakeFiles/embdinput.dir/embd-input-lib.cpp.o [ 85%] Linking CXX static library libembdinput.a [ 85%] Built target embdinput [ 87%] Building CXX object examples/embd-input/CMakeFiles/embd-input-test.dir/embd-input-test.cpp.o [ 89%] Linking CXX executable ../../bin/embd-input-test [ 89%] Built target embd-input-test [ 91%] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o [ 92%] Linking CXX executable ../../bin/server [ 92%] Built target server [ 94%] Building CXX object pocs/vdot/CMakeFiles/vdot.dir/vdot.cpp.o [ 96%] Linking CXX executable ../../bin/vdot [ 96%] Built target vdot [ 98%] Building CXX object pocs/vdot/CMakeFiles/q8dot.dir/q8dot.cpp.o [100%] Linking CXX executable ../../bin/q8dot [100%] Built target q8dot

Gideonah commented 1 year ago

For those in future who may stumble across this. So the files are built within the bin path.

So after build/bin/, if you built it with no problems, then you'll see all the files required as if you ran make command in the llama.cpp path.

you can run CUDA_VISIBLE_DEVICES=0 ./bin/server -m </path/to/model>/llama-2-7b-chat.ggmlv3.q8_0.bin -ngl 35 to start the server but essentially all the files are in the bin path.

Good luck.