go-skynet / go-llama.cpp

LLama.cpp golang bindings
MIT License
685 stars 83 forks source link

Apple Silicon Support? #42

Open ControlCplusControlV opened 1 year ago

ControlCplusControlV commented 1 year ago

Hey!

I was testing this repo locally hoping to write one of my projects in Go, I however ran into the following issues, I ran

❯ git clone --recurse-submodules https://github.com/go-skynet/go-llama.cpp
❯ cd go-llama.cpp
❯ make libbinding.a
❯ LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "../llama.cpp/models/7B/" -t 14

I encounter

# github.com/go-skynet/go-llama.cpp/examples
/usr/local/go/pkg/tool/darwin_amd64/link: running clang++ failed: exit status 1
ld: warning: -no_pie is deprecated when targeting new OS versions
ld: warning: ignoring file /Users/controlc/code/Magi/go-llama.cpp/libbinding.a, building for macOS-x86_64 but attempting to link with file built for macOS-arm64
Undefined symbols for architecture x86_64:

...

ld: symbol(s) not found for architecture x86_64
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)

Is this user error on my side or is apple silicon not yet supported? Running on m1 pro 32 gb of RAM, with 7B Llama model, which has already been quantitized

Aisuko commented 1 year ago

Hi @ControlCplusControlV, thanks for your feedback. We will check soon.

Aisuko commented 1 year ago

I have tested it in my devcontainer, It looks like I do not hit the issue on my M1 Pro. Here is the detail of my environment below and the open-source AI model you can find in here https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/main

I guess we can add some documents for loading open-source AI models. Is this a good idea? @mudler

vscode ➜ /workspaces/go-llama.cpp (master) $ LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "./wizardLM-7B.ggml.q4_0.bin" -t 14
llama.cpp: loading model from ./wizardLM-7B.ggml.q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 128
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  72.75 KB
llama_model_load_internal: mem required  = 5809.34 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size  =  128.00 MB
Model loaded successfully.
>>> I want to .....

Sending I want to .....

 A an AI language model, I don't have a preference or desire. However, I can help you find information on anything you want to know about. Just let
vscode ➜ /workspaces/go-llama.cpp (master) $ uname -a
Linux 652da11c33d5 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 aarch64 GNU/Linux
sa- commented 1 year ago

I can confirm this error on MacBook Pro (13-inch, M1, 2020)

▶ CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "./ggml-vicuna-7B-1.1-f16.bin" -t 14
# github.com/go-skynet/go-llama.cpp/examples
/usr/local/go/pkg/tool/darwin_amd64/link: running clang++ failed: exit status 1
ld: warning: -no_pie is deprecated when targeting new OS versions
ld: warning: ignoring file /Users/sa-/Code/go-llama.cpp/libbinding.a, building for macOS-x86_64 but attempting to link with file built for macOS-arm64
Undefined symbols for architecture x86_64:
  "llama_tokenize(llama_context*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, bool)", referenced from:
      _get_embeddings in 000002.o
      _llama_predict in 000002.o
  "get_num_physical_cores()", referenced from:
      _llama_allocate_params in 000002.o
  "_llama_context_default_params", referenced from:
      _load_model in 000002.o
  "_llama_copy_state_data", referenced from:
      _save_state in 000002.o
  "_llama_eval", referenced from:
      _get_embeddings in 000002.o
      _eval in 000002.o
      _llama_predict in 000002.o
  "_llama_free", referenced from:
      _llama_free_model in 000002.o
     (maybe you meant: __cgo_bfa0f8386f5f_Cfunc_llama_free_model, _llama_free_params , _llama_free_model , __cgo_bfa0f8386f5f_Cfunc_llama_free_params )
  "_llama_get_embeddings", referenced from:
      _get_embeddings in 000002.o
  "_llama_get_logits", referenced from:
      _llama_predict in 000002.o
  "_llama_get_state_size", referenced from:
      _load_state in 000002.o
      _save_state in 000002.o
  "_llama_init_backend", referenced from:
      _get_embeddings in 000002.o
      _llama_predict in 000002.o
  "_llama_init_from_file", referenced from:
      _load_model in 000002.o
  "_llama_load_session_file", referenced from:
      _llama_predict in 000002.o
  "_llama_n_ctx", referenced from:
      _llama_predict in 000002.o
  "_llama_n_embd", referenced from:
      _get_embeddings in 000002.o
  "_llama_n_vocab", referenced from:
      _llama_predict in 000002.o
  "_llama_print_timings", referenced from:
      _llama_predict in 000002.o
  "_llama_reset_timings", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_frequency_and_presence_penalties", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_repetition_penalty", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_tail_free", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_temperature", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_token", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_token_greedy", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_token_mirostat", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_token_mirostat_v2", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_top_k", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_top_p", referenced from:
      _llama_predict in 000002.o
  "_llama_sample_typical", referenced from:
      _llama_predict in 000002.o
  "_llama_save_session_file", referenced from:
      _llama_predict in 000002.o
  "_llama_set_rng_seed", referenced from:
      _llama_predict in 000002.o
  "_llama_set_state_data", referenced from:
      _load_state in 000002.o
  "_llama_token_eos", referenced from:
      _llama_predict in 000002.o
      _llama_allocate_params in 000002.o
  "_llama_token_nl", referenced from:
      _llama_predict in 000002.o
  "_llama_token_to_str", referenced from:
      _get_token_embeddings in 000002.o
      _llama_predict in 000002.o
  "_llama_tokenize", referenced from:
      _eval in 000002.o
ld: symbol(s) not found for architecture x86_64
deep-pipeline commented 1 year ago

@ControlCplusControlV @sa- both of your error messages show that on your M1 machine you are trying to link against an AMD64 (ie. x86) library.. now some stuff might work that way (given the Rosetta2 system may absorb some issues and allow some AMD64 ie x86 binaries to run apparently fine) but clearly there is a mismatch being detected at compilation in this case for some reason, which issue doesn't occur to the repo owner who is able to get things to work on their M1 machine.

Sorry I don't know GoLang compilation pathway on M1 etc well enough to point out what you need to reinstall etc but given it's likely a sufficiently old ie. Pre-sept 2021 type issue, you can probably stackoverflow or ChatGPT your way out of it..

Good Luck!

tmc commented 1 year ago

For what it's worth this error isn't happening on my M2.