go-skynet / go-llama.cpp

LLama.cpp golang bindings
MIT License
650 stars 79 forks source link

Trying to compile with cublas but it's not working with error. how to resolve? #216

Closed hiqsociety closed 11 months ago

hiqsociety commented 11 months ago

CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go build ./examples/main.go

# github.com/go-skynet/go-llama.cpp
binding.cpp: In function ‘int llama_predict(void*, void*, char*, bool)’:
binding.cpp:332:53: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 2 has type ‘int’ [-Wformat=]
  332 |                 printf("<<input too long: skipped %zu token%s>>", skipped_tokens, skipped_tokens != 1 ? "s" : "");
      |                                                   ~~^             ~~~~~~~~~~~~~~
      |                                                     |             |
      |                                                     |             int
      |                                                     long unsigned int
      |                                                   %u
binding.cpp: In function ‘void llama_binding_free_model(void*)’:
binding.cpp:797:5: warning: possible problem detected in invocation of ‘operator delete’ [-Wdelete-incomplete]
  797 |     delete ctx->model;
      |     ^~~~~~~~~~~~~~~~~
binding.cpp:797:17: warning: invalid use of incomplete type ‘struct llama_model’
  797 |     delete ctx->model;
      |            ~~~~~^~~~~
In file included from ./llama.cpp/common/common.h:5,
                 from binding.cpp:1:
./llama.cpp/llama.h:60:12: note: forward declaration of ‘struct llama_model’
   60 |     struct llama_model;
      |            ^~~~~~~~~~~
binding.cpp:797:5: note: neither the destructor nor the class-specific ‘operator delete’ will be called, even if they are declared when the class is defined
  797 |     delete ctx->model;
      |     ^~~~~~~~~~~~~~~~~
hiqsociety commented 11 months ago

in another related issue, i used llama.cpp that has compiled successfully with cublas and replaced the llama.cpp folder and tried to cublas make libbinding.a with error below, how to resolve?

root@ubuntu:/usr/local/src/go-llama.cpp# BUILD_TYPE=cublas make libbinding.a
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I./llama.cpp -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native
I CXXFLAGS: -I./llama.cpp -I. -I./llama.cpp/common -I./common -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread
I CGO_LDFLAGS:  
I LDFLAGS:  
I BUILD_TYPE:  cublas
I CMAKE_ARGS:  -DLLAMA_CUBLAS=ON
I EXTRA_TARGETS:  llama.cpp/ggml-cuda.o
I CC:       cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX:      g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

g++ -I./llama.cpp -I. -I./llama.cpp/common -I./common -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -I./llama.cpp -I./llama.cpp/common binding.cpp -o binding.o -c 
binding.cpp: In function ‘int get_embeddings(void*, void*, float*)’:
binding.cpp:39:5: error: ‘llama_binding_state’ was not declared in this scope; did you mean ‘llama_beams_state’?
   39 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_beams_state
binding.cpp:39:26: error: ‘state’ was not declared in this scope; did you mean ‘_xstate’?
   39 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                          ^~~~~
      |                          _xstate
binding.cpp:39:55: error: expected primary-expression before ‘)’ token
   39 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                                                       ^
binding.cpp:37:44: warning: unused parameter ‘state_pr’ [-Wunused-parameter]
   37 | int get_embeddings(void* params_ptr, void* state_pr, float * res_embeddings) {
      |                                      ~~~~~~^~~~~~~~
binding.cpp: In function ‘int get_token_embeddings(void*, void*, int*, int, float*)’:
binding.cpp:78:5: error: ‘llama_binding_state’ was not declared in this scope; did you mean ‘llama_beams_state’?
   78 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_beams_state
binding.cpp:78:26: error: ‘state’ was not declared in this scope; did you mean ‘_xstate’?
   78 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                          ^~~~~
      |                          _xstate
binding.cpp:78:55: error: expected primary-expression before ‘)’ token
   78 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                                                       ^
binding.cpp: In function ‘int eval(void*, void*, char*)’:
binding.cpp:94:5: error: ‘llama_binding_state’ was not declared in this scope; did you mean ‘llama_beams_state’?
   94 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_beams_state
binding.cpp:94:26: error: ‘state’ was not declared in this scope; did you mean ‘_xstate’?
   94 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                          ^~~~~
      |                          _xstate
binding.cpp:94:55: error: expected primary-expression before ‘)’ token
   94 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                                                       ^
binding.cpp:92:33: warning: unused parameter ‘state_pr’ [-Wunused-parameter]
   92 | int eval(void* params_ptr,void* state_pr,char *text) {
      |                           ~~~~~~^~~~~~~~
binding.cpp: In function ‘int llama_predict(void*, void*, char*, bool)’:
binding.cpp:120:5: error: ‘llama_binding_state’ was not declared in this scope; did you mean ‘llama_beams_state’?
  120 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_beams_state
binding.cpp:120:26: error: ‘state’ was not declared in this scope; did you mean ‘_xstate’?
  120 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                          ^~~~~
      |                          _xstate
binding.cpp:120:55: error: expected primary-expression before ‘)’ token
  120 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                                                       ^
binding.cpp:332:53: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 2 has type ‘int’ [-Wformat=]
  332 |                 printf("<<input too long: skipped %zu token%s>>", skipped_tokens, skipped_tokens != 1 ? "s" : "");
      |                                                   ~~^             ~~~~~~~~~~~~~~
      |                                                     |             |
      |                                                     |             int
      |                                                     long unsigned int
      |                                                   %u
binding.cpp:464:42: warning: cast from type ‘const char*’ to type ‘char*’ casts away qualifiers [-Wcast-qual]
  464 |             if (!tokenCallback(state_pr, (char*)token_str.c_str())) {
      |                                          ^~~~~~~~~~~~~~~~~~~~~~~~
binding.cpp:290:10: warning: unused variable ‘is_antiprompt’ [-Wunused-variable]
  290 |     bool is_antiprompt        = false;
      |          ^~~~~~~~~~~~~
binding.cpp:291:10: warning: unused variable ‘input_echo’ [-Wunused-variable]
  291 |     bool input_echo           = true;
      |          ^~~~~~~~~~
binding.cpp:531:1: warning: label ‘end’ defined but not used [-Wunused-label]
  531 | end:
      | ^~~
binding.cpp: In function ‘int speculative_sampling(void*, void*, void*, char*, bool)’:
binding.cpp:556:5: error: ‘llama_binding_state’ was not declared in this scope; did you mean ‘llama_beams_state’?
  556 |     llama_binding_state* target_model_state = (llama_binding_state*) target_model;
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_beams_state
binding.cpp:556:26: error: ‘target_model_state’ was not declared in this scope; did you mean ‘target_model’?
  556 |     llama_binding_state* target_model_state = (llama_binding_state*) target_model;
      |                          ^~~~~~~~~~~~~~~~~~
      |                          target_model
binding.cpp:556:68: error: expected primary-expression before ‘)’ token
  556 |     llama_binding_state* target_model_state = (llama_binding_state*) target_model;
      |                                                                    ^
binding.cpp:557:26: error: ‘draft_model_state’ was not declared in this scope; did you mean ‘draft_model’?
  557 |     llama_binding_state* draft_model_state = (llama_binding_state*) draft_model;
      |                          ^~~~~~~~~~~~~~~~~
      |                          draft_model
binding.cpp:557:67: error: expected primary-expression before ‘)’ token
  557 |     llama_binding_state* draft_model_state = (llama_binding_state*) draft_model;
      |                                                                   ^
binding.cpp:654:45: warning: cast from type ‘const char*’ to type ‘char*’ casts away qualifiers [-Wcast-qual]
  654 |             if (!tokenCallback(draft_model, (char*)token_str.c_str())) {
      |                                             ^~~~~~~~~~~~~~~~~~~~~~~~
binding.cpp:563:19: warning: unused variable ‘model_tgt’ [-Wunused-variable]
  563 |     llama_model * model_tgt = target_model_state->model;
      |                   ^~~~~~~~~
binding.cpp:564:19: warning: unused variable ‘model_dft’ [-Wunused-variable]
  564 |     llama_model * model_dft = draft_model_state->model;
      |                   ^~~~~~~~~
binding.cpp:553:50: warning: unused parameter ‘target_model’ [-Wunused-parameter]
  553 | int speculative_sampling(void* params_ptr, void* target_model, void* draft_model, char* result, bool debug) {
      |                                            ~~~~~~^~~~~~~~~~~~
binding.cpp: In function ‘void llama_binding_free_model(void*)’:
binding.cpp:795:5: error: ‘llama_binding_state’ was not declared in this scope; did you mean ‘llama_beams_state’?
  795 |     llama_binding_state* ctx = (llama_binding_state*) state_ptr;
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_beams_state
binding.cpp:795:26: error: ‘ctx’ was not declared in this scope
  795 |     llama_binding_state* ctx = (llama_binding_state*) state_ptr;
      |                          ^~~
binding.cpp:795:53: error: expected primary-expression before ‘)’ token
  795 |     llama_binding_state* ctx = (llama_binding_state*) state_ptr;
      |                                                     ^
binding.cpp:794:37: warning: unused parameter ‘state_ptr’ [-Wunused-parameter]
  794 | void llama_binding_free_model(void *state_ptr) {
      |                               ~~~~~~^~~~~~~~~
binding.cpp: In function ‘int llama_tokenize_string(void*, void*, int*)’:
binding.cpp:807:5: error: ‘llama_binding_state’ was not declared in this scope; did you mean ‘llama_beams_state’?
  807 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_beams_state
binding.cpp:807:26: error: ‘state’ was not declared in this scope; did you mean ‘_xstate’?
  807 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                          ^~~~~
      |                          _xstate
binding.cpp:807:55: error: expected primary-expression before ‘)’ token
  807 |     llama_binding_state* state = (llama_binding_state*) state_pr;
      |                                                       ^
binding.cpp:805:51: warning: unused parameter ‘state_pr’ [-Wunused-parameter]
  805 | int llama_tokenize_string(void* params_ptr, void* state_pr, int* result) {
      |                                             ~~~~~~^~~~~~~~
binding.cpp: In function ‘void* load_model(const char*, int, int, bool, bool, bool, bool, bool, int, int, const char*, const char*, bool, float, float, bool, const char*, const char*, bool)’:
binding.cpp:946:11: error: ‘load_binding_model’ was not declared in this scope
  946 |    return load_binding_model(fname, n_ctx, n_seed, memory_f16, mlock, embeddings, mmap, low_vram, n_gpu_layers, n_batch, maingpu, tensorsplit, numa, rope_freq_base, rope_freq_scale, mul_mat_q, lora, lora_base, perplexity);
      |           ^~~~~~~~~~~~~~~~~~
make: *** [Makefile:207: binding.o] Error 1
MathiasGrund commented 11 months ago

You original output contains only warning; these happen with current head but if works find despite them so you can just ignore them.

go-llama.cpp patches a few files inside the llama.cpp submodule, so you need to run that patch and compile llama.cpp (make -B prepare from go-llama.cpp folder will force the patching if make doesn't detect that it should re-run it). I think you should just do what you did at first and ignore the warnings (also when they occur when you compile your go program).

hiqsociety commented 11 months ago

@MathiasGS thx for the prompt reply,

i cant ignore them. i've done the make cublas libbinding.a thing and it works but when i try to run the example as shown, it throws out error. how to resolve?

# github.com/go-skynet/go-llama.cpp
binding.cpp: In function ‘int llama_predict(void*, void*, char*, bool)’:
binding.cpp:332:53: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 2 has type ‘int’ [-Wformat=]
  332 |                 printf("<<input too long: skipped %zu token%s>>", skipped_tokens, skipped_tokens != 1 ? "s" : "");
      |                                                   ~~^             ~~~~~~~~~~~~~~
      |                                                     |             |
      |                                                     |             int
      |                                                     long unsigned int
      |                                                   %u
binding.cpp: In function ‘void llama_binding_free_model(void*)’:
binding.cpp:797:5: warning: possible problem detected in invocation of ‘operator delete’ [-Wdelete-incomplete]
  797 |     delete ctx->model;
      |     ^~~~~~~~~~~~~~~~~
binding.cpp:797:17: warning: invalid use of incomplete type ‘struct llama_model’
  797 |     delete ctx->model;
      |            ~~~~~^~~~~
In file included from ../llama.cpp/common/common.h:5,
                 from binding.cpp:1:
../llama.cpp/llama.h:60:12: note: forward declaration of ‘struct llama_model’
   60 |     struct llama_model;
      |            ^~~~~~~~~~~
binding.cpp:797:5: note: neither the destructor nor the class-specific ‘operator delete’ will be called, even if they are declared when the class is defined
  797 |     delete ctx->model;
      |     ^~~~~~~~~~~~~~~~~
# github.com/go-skynet/go-llama.cpp/examples
/usr/local/go/pkg/tool/linux_amd64/link: running g++ failed: exit status 1
/usr/bin/ld: /tmp/go-link-1649869791/000002.o: in function `load_model':
/usr/local/src/go-llama.cpp/binding.cpp:946: undefined reference to `load_binding_model(char const*, int, int, bool, bool, bool, bool, bool, int, int, char const*, char const*, bool, float, float, bool, char const*, char const*, bool)'
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o): in function `quantize_row_q8_1_cuda(float const*, void*, int, int, int, CUstream_st*)':
tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0xcb): undefined reference to `__cudaPushCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x172): undefined reference to `__cudaPopCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x1b1): undefined reference to `cudaLaunchKernel'
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o): in function `ggml_cuda_cpy_tensor_2d(void*, ggml_tensor const*, long, long, long, long, CUstream_st*)':
tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x325): undefined reference to `cudaMemcpy2DAsync'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x391): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x407): undefined reference to `cudaMemcpy2DAsync'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x4a1): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x4a9): undefined reference to `cudaGetErrorString'
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o): in function `ggml_cuda_pool_malloc(unsigned long, unsigned long*)':
tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x561): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x66f): undefined reference to `cudaMalloc'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x6d8): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x6e0): undefined reference to `cudaGetErrorString'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x73e): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x746): undefined reference to `cudaGetErrorString'
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o): in function `ggml_cuda_pool_free(void*, unsigned long)':
tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x7d1): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x876): undefined reference to `cudaFree'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x88f): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x897): undefined reference to `cudaGetErrorString'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x8f5): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text+0x8fd): undefined reference to `cudaGetErrorString'

continuing...

tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x2cc): undefined reference to `__cudaPushCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x3a9): undefined reference to `__cudaPushCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x47b): undefined reference to `__cudaPopCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x4d8): undefined reference to `cudaLaunchKernel'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x569): undefined reference to `__cudaPushCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x63c): undefined reference to `__cudaPopCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x699): undefined reference to `cudaLaunchKernel'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x766): undefined reference to `__cudaPopCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z17ggml_cuda_op_ropePK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x7c3): undefined reference to `cudaLaunchKernel'
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o): in function `ggml_cuda_op_mul_mat_cublas(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st* const&)':
tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st[_Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st]+0xac): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st[_Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st]+0xe3): undefined reference to `cublasSetStream_v2'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st[_Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st]+0x134): undefined reference to `cublasSgemm_v2'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st[_Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st]+0x314): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st[_Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st]+0x373): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st[_Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st]+0x392): undefined reference to `cudaGetDevice'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st[_Z27ggml_cuda_op_mul_mat_cublasPK11ggml_tensorS1_PS_PKcPKfS4_PfllllRKP11CUstream_st]+0x39a): undefined reference to `cudaGetErrorString'
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o): in function `ggml_cuda_op_alibi(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, float const*, float const*, float*, CUstream_st* const&)':
tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z18ggml_cuda_op_alibiPK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z18ggml_cuda_op_alibiPK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x1e7): undefined reference to `__cudaPushCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z18ggml_cuda_op_alibiPK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z18ggml_cuda_op_alibiPK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x2d5): undefined reference to `__cudaPopCallConfiguration'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text._Z18ggml_cuda_op_alibiPK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st[_Z18ggml_cuda_op_alibiPK11ggml_tensorS1_PS_PKfS4_PfRKP11CUstream_st]+0x314): undefined reference to `cudaLaunchKernel'
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o): in function `__sti____cudaRegisterAll()':
tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text.startup+0xd): undefined reference to `__cudaRegisterFatBinary'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text.startup+0x41): undefined reference to `__cudaRegisterFunction'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text.startup+0x6f): undefined reference to `__cudaRegisterFunction'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text.startup+0x9d): undefined reference to `__cudaRegisterFunction'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text.startup+0xcb): undefined reference to `__cudaRegisterFunction'
/usr/bin/ld: tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text.startup+0xf9): undefined reference to `__cudaRegisterFunction'
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o):tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text.startup+0x127): more undefined references to `__cudaRegisterFunction' follow
/usr/bin/ld: /usr/local/src/go-llama.cpp//libbinding.a(ggml-cuda.o): in function `__sti____cudaRegisterAll()':
tmpxft_00011567_00000000-6_ggml-cuda.compute_70.cudafe1.cpp:(.text.startup+0xd41): undefined reference to `__cudaRegisterFatBinaryEnd'
collect2: error: ld returned 1 exit status
MathiasGrund commented 11 months ago

It fails with undefined reference toload_binding_model[...] which is exactly the method being added by the patch (well, one of them). This leads me to believe again that the patch was not correctly applied. I think the "[I] replaced the llama.cpp folder" thing is biting you here and I would just delete the go-llama.cpp folder and start from scratch with a new checkout and following the guide like you did originally. Seems to me that just ignoring the warnings would have left you with a working binary in your first try. Hope this gets you going!

hiqsociety commented 11 months ago

@MathiasGS closing this as new issue: https://github.com/go-skynet/go-llama.cpp/issues/218