edgenai / edgen

⚡ Edgen: Local, private GenAI server alternative to OpenAI. No GPU required. Run AI models locally: LLMs (Llama2, Mistral, Mixtral...), Speech-to-text (whisper) and many others.
https://docs.edgen.co/
Apache License 2.0
332 stars 15 forks source link

how do I build edgen locally in Mac #110

Open prabirshrestha opened 7 months ago

prabirshrestha commented 7 months ago

What is a correct way to build edgen locally in Mac with metal?

git clone https://github.com/edgenai/edgen.git
cd edgen/edgen
npm run tauri build

This seems to always crash with segfault with or without llama_meta feature. It used to work before but has been failing recently.

cargo run --release --features llama_metal -- serve
   Compiling edgen v0.1.3 (/Users/username/code/tmp/edgen/edgen/src-tauri)
    Finished release [optimized] target(s) in 3.10s
     Running `/Users/username/code/tmp/edgen/target/release/edgen serve`
Segmentation fault: 11
curl http://localhost:33322/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer no-key-required" -d '{
  "model": "default",
  "messages": [
    {
      "role": "system",
      "content": "You are EdgenChat, a helpful AI assistant."
    },
    {
      "role": "user",
      "content": "Hello!"
    }
  ]
}'

I'm using default config and have reset it too.

opeolluwa commented 7 months ago

@prabirshrestha when the build fails, does it out a specific error message

prabirshrestha commented 6 months ago

Build works. But running the server fails with the error I mentioned. That is the only error I see even with RUST_BACKTRACE=1

opeolluwa commented 6 months ago

@prabirshrestha this "Segmentation fault: 11" ?

prabirshrestha commented 6 months ago

yes. Seems like using the official release version in Mac also seems to fail now. Probably some changes in master that is causing the issue.

prabirshrestha commented 6 months ago

Now I'm getting this error.

    Finished dev [unoptimized + debuginfo] target(s) in 8.58s
     Running `target/debug/edgen serve`
Assertion failed: (ne % ggml_blck_size(type) == 0), function ggml_row_size, file ggml.c, line 2126.
Abort trap: 6
opeolluwa commented 6 months ago

yes. Seems like using the official release version in Mac also seems to fail now. Probably some changes in master that is causing the issue.

Most likely, I'll inspect the CI build, might be a system deps or something

prabirshrestha commented 6 months ago
Here are the new logs 45f2a7d7034621832891518b13a5855948c89771 ``` /Users/prabirshrestha/code/tmp/edgen$ cargo run --release Compiling edgen v0.1.5 (/Users/prabirshrestha/code/tmp/edgen/edgen/src-tauri) Finished release [optimized] target(s) in 2.99s Running `target/release/edgen` 2024-03-27T02:34:21.218710Z INFO edgen_core::settings: Loading existing settings file: /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/edgen.conf.yaml 2024-03-27T02:34:21.221257Z INFO edgen_server: Using default URI 2024-03-27T02:34:21.221333Z INFO edgen_server: Listening in on: http://127.0.0.1:33322 2024-03-27T02:34:33.235666Z INFO edgen_server::model: Loading existing model patterns file 2024-03-27T02:34:33.235867Z INFO hf_hub: Token file not found "/Users/prabirshrestha/.cache/huggingface/token" 2024-03-27T02:34:33.236960Z INFO edgen_server::status: progress observer: no download necessary, file is already there 2024-03-27T02:34:33.237134Z INFO edgen_core::perishable: (Re)Creating a new llama_cpp::model::LlamaModel 2024-03-27T02:34:33.237180Z INFO edgen_rt_llama_cpp: Loading /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf into memory 2024-03-27T02:34:33.238119Z INFO llama_cpp::model: Loading model "/Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf" 2024-03-27T02:34:33.242906Z INFO llama.cpp: llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf (version GGUF V3 (latest)) 2024-03-27T02:34:33.242920Z INFO llama.cpp: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 2024-03-27T02:34:33.242926Z INFO llama.cpp: llama_model_loader: - kv 0: general.architecture str = llama 2024-03-27T02:34:33.242929Z INFO llama.cpp: llama_model_loader: - kv 1: general.name str = intel_neural-chat-7b-v3-3 2024-03-27T02:34:33.242932Z INFO llama.cpp: llama_model_loader: - kv 2: llama.context_length u32 = 32768 2024-03-27T02:34:33.242934Z INFO llama.cpp: llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 2024-03-27T02:34:33.242936Z INFO llama.cpp: llama_model_loader: - kv 4: llama.block_count u32 = 32 2024-03-27T02:34:33.242939Z INFO llama.cpp: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 2024-03-27T02:34:33.242941Z INFO llama.cpp: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 2024-03-27T02:34:33.242943Z INFO llama.cpp: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 2024-03-27T02:34:33.242946Z INFO llama.cpp: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 2024-03-27T02:34:33.242950Z INFO llama.cpp: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 2024-03-27T02:34:33.242954Z INFO llama.cpp: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 2024-03-27T02:34:33.242956Z INFO llama.cpp: llama_model_loader: - kv 11: general.file_type u32 = 15 2024-03-27T02:34:33.242958Z INFO llama.cpp: llama_model_loader: - kv 12: tokenizer.ggml.model str = llama 2024-03-27T02:34:33.247335Z INFO llama.cpp: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<... 2024-03-27T02:34:33.255357Z INFO llama.cpp: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... 2024-03-27T02:34:33.256454Z INFO llama.cpp: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... 2024-03-27T02:34:33.256457Z INFO llama.cpp: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 2024-03-27T02:34:33.256459Z INFO llama.cpp: llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 2024-03-27T02:34:33.256461Z INFO llama.cpp: llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 2024-03-27T02:34:33.256462Z INFO llama.cpp: llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0 2024-03-27T02:34:33.256464Z INFO llama.cpp: llama_model_loader: - kv 20: general.quantization_version u32 = 2 2024-03-27T02:34:33.256466Z INFO llama.cpp: llama_model_loader: - type f32: 65 tensors 2024-03-27T02:34:33.256468Z INFO llama.cpp: llama_model_loader: - type q4_K: 193 tensors 2024-03-27T02:34:33.256470Z INFO llama.cpp: llama_model_loader: - type q6_K: 33 tensors 2024-03-27T02:34:33.266441Z INFO llama.cpp: llm_load_vocab: special tokens definition check successful ( 259/32000 ). 2024-03-27T02:34:33.266445Z INFO llama.cpp: llm_load_print_meta: format = GGUF V3 (latest) 2024-03-27T02:34:33.266447Z INFO llama.cpp: llm_load_print_meta: arch = llama 2024-03-27T02:34:33.266448Z INFO llama.cpp: llm_load_print_meta: vocab type = SPM 2024-03-27T02:34:33.266450Z INFO llama.cpp: llm_load_print_meta: n_vocab = 32000 2024-03-27T02:34:33.266451Z INFO llama.cpp: llm_load_print_meta: n_merges = 0 2024-03-27T02:34:33.266453Z INFO llama.cpp: llm_load_print_meta: n_ctx_train = 32768 2024-03-27T02:34:33.266454Z INFO llama.cpp: llm_load_print_meta: n_embd = 4096 2024-03-27T02:34:33.266456Z INFO llama.cpp: llm_load_print_meta: n_head = 32 2024-03-27T02:34:33.266458Z INFO llama.cpp: llm_load_print_meta: n_head_kv = 8 2024-03-27T02:34:33.266459Z INFO llama.cpp: llm_load_print_meta: n_layer = 32 2024-03-27T02:34:33.266460Z INFO llama.cpp: llm_load_print_meta: n_rot = 128 2024-03-27T02:34:33.266462Z INFO llama.cpp: llm_load_print_meta: n_embd_head_k = 128 2024-03-27T02:34:33.266463Z INFO llama.cpp: llm_load_print_meta: n_embd_head_v = 128 2024-03-27T02:34:33.266465Z INFO llama.cpp: llm_load_print_meta: n_gqa = 4 2024-03-27T02:34:33.266466Z INFO llama.cpp: llm_load_print_meta: n_embd_k_gqa = 1024 2024-03-27T02:34:33.266468Z INFO llama.cpp: llm_load_print_meta: n_embd_v_gqa = 1024 2024-03-27T02:34:33.266469Z INFO llama.cpp: llm_load_print_meta: f_norm_eps = 0.0e+00 2024-03-27T02:34:33.266471Z INFO llama.cpp: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 2024-03-27T02:34:33.266473Z INFO llama.cpp: llm_load_print_meta: f_clamp_kqv = 0.0e+00 2024-03-27T02:34:33.266474Z INFO llama.cpp: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 2024-03-27T02:34:33.266475Z INFO llama.cpp: llm_load_print_meta: f_logit_scale = 0.0e+00 2024-03-27T02:34:33.266477Z INFO llama.cpp: llm_load_print_meta: n_ff = 14336 2024-03-27T02:34:33.266478Z INFO llama.cpp: llm_load_print_meta: n_expert = 0 2024-03-27T02:34:33.266480Z INFO llama.cpp: llm_load_print_meta: n_expert_used = 0 2024-03-27T02:34:33.266481Z INFO llama.cpp: llm_load_print_meta: causal attn = 1 2024-03-27T02:34:33.266483Z INFO llama.cpp: llm_load_print_meta: pooling type = 0 2024-03-27T02:34:33.266484Z INFO llama.cpp: llm_load_print_meta: rope type = 0 2024-03-27T02:34:33.266485Z INFO llama.cpp: llm_load_print_meta: rope scaling = linear 2024-03-27T02:34:33.266487Z INFO llama.cpp: llm_load_print_meta: freq_base_train = 10000.0 2024-03-27T02:34:33.266489Z INFO llama.cpp: llm_load_print_meta: freq_scale_train = 1 2024-03-27T02:34:33.266490Z INFO llama.cpp: llm_load_print_meta: n_yarn_orig_ctx = 32768 2024-03-27T02:34:33.266492Z INFO llama.cpp: llm_load_print_meta: rope_finetuned = unknown 2024-03-27T02:34:33.266493Z INFO llama.cpp: llm_load_print_meta: ssm_d_conv = 0 2024-03-27T02:34:33.266495Z INFO llama.cpp: llm_load_print_meta: ssm_d_inner = 0 2024-03-27T02:34:33.266496Z INFO llama.cpp: llm_load_print_meta: ssm_d_state = 0 2024-03-27T02:34:33.266497Z INFO llama.cpp: llm_load_print_meta: ssm_dt_rank = 0 2024-03-27T02:34:33.266499Z INFO llama.cpp: llm_load_print_meta: model type = 7B 2024-03-27T02:34:33.266521Z INFO llama.cpp: llm_load_print_meta: model ftype = Q4_K - Medium 2024-03-27T02:34:33.266523Z INFO llama.cpp: llm_load_print_meta: model params = 7.24 B 2024-03-27T02:34:33.266525Z INFO llama.cpp: llm_load_print_meta: model size = 4.07 GiB (4.83 BPW) 2024-03-27T02:34:33.266526Z INFO llama.cpp: llm_load_print_meta: general.name = intel_neural-chat-7b-v3-3 2024-03-27T02:34:33.266528Z INFO llama.cpp: llm_load_print_meta: BOS token = 1 '' 2024-03-27T02:34:33.266529Z INFO llama.cpp: llm_load_print_meta: EOS token = 2 '' 2024-03-27T02:34:33.266531Z INFO llama.cpp: llm_load_print_meta: UNK token = 0 '' 2024-03-27T02:34:33.266533Z INFO llama.cpp: llm_load_print_meta: PAD token = 0 '' 2024-03-27T02:34:33.266534Z INFO llama.cpp: llm_load_print_meta: LF token = 13 '<0x0A>' 2024-03-27T02:34:33.266550Z INFO llama.cpp: llm_load_tensors: ggml ctx size = 0.11 MiB 2024-03-27T02:34:33.267130Z INFO llama.cpp: llm_load_tensors: CPU buffer size = 4165.37 MiB 2024-03-27T02:34:33.267526Z WARN llama_cpp::model: Could not find metadata key="%s.attention.key_length" 2024-03-27T02:34:33.267530Z WARN llama_cpp::model: Could not find metadata key="%s.attention.value_length" 2024-03-27T02:34:33.267533Z WARN llama_cpp::model: Could not find metadata key="%s.ssm.conv_kernel" 2024-03-27T02:34:33.267535Z WARN llama_cpp::model: Could not find metadata key="%s.ssm.inner_size" 2024-03-27T02:34:33.267536Z WARN llama_cpp::model: Could not find metadata key="%s.ssm.state_size" 2024-03-27T02:34:33.267556Z INFO edgen_rt_llama_cpp: No matching session found, creating new one 2024-03-27T02:34:33.267567Z INFO edgen_core::perishable: (Re)Creating a new llama_cpp::session::LlamaSession 2024-03-27T02:34:33.267569Z INFO edgen_rt_llama_cpp: Allocating new LLM session 2024-03-27T02:34:33.267581Z INFO llama.cpp: llama_new_context_with_model: n_ctx = 4096 2024-03-27T02:34:33.267584Z INFO llama.cpp: llama_new_context_with_model: n_batch = 2048 2024-03-27T02:34:33.267585Z INFO llama.cpp: llama_new_context_with_model: n_ubatch = 512 2024-03-27T02:34:33.267587Z INFO llama.cpp: llama_new_context_with_model: freq_base = 10000.0 2024-03-27T02:34:33.267589Z INFO llama.cpp: llama_new_context_with_model: freq_scale = 1 2024-03-27T02:34:33.304810Z INFO llama.cpp: llama_kv_cache_init: CPU KV buffer size = 512.00 MiB 2024-03-27T02:34:33.304822Z INFO llama.cpp: llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB 2024-03-27T02:34:33.321556Z INFO llama.cpp: llama_new_context_with_model: CPU output buffer size = 250.00 MiB GGML_ASSERT: /Users/prabirshrestha/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/ggml.c:4906: b->type == GGML_TYPE_I32 Abort trap: 6 ```
opeolluwa commented 6 months ago

Would you like to share your environment, OS version, Rust and Nodejs tool chain version and all

opeolluwa commented 6 months ago

I'll build this on my Mac and see where we stand

opeolluwa commented 6 months ago

@francis2tm see this

opeolluwa commented 6 months ago

@prabirshrestha I tried building it on a Mac, I think there might be some missing system deps:

I make a fork and added some README. https://github.com/opeolluwa/edgen/tree/main/edgen Follow the instructions, let's see where we go from there

Looking for "nm" or an equivalent tool
  NM_PATH not set, looking for ["nm", "llvm-nm"] in PATH
  Valid tool found:
  llvm-nm, compatible with GNU nm
  Apple LLVM version 14.0.3 (clang-1403.0.22.14.1)
    Optimized build.
    Default target: arm64-apple-darwin22.6.0
    Host CPU: apple-m1

  cargo:rerun-if-env-changed=OBJCOPY_PATH
  Looking for "objcopy" or an equivalent tool..
  OBJCOPY_PATH not set, looking for ["llvm-objcopy"] in PATH

  --- stderr
  CMake Warning:
    Manually-specified variables were not used by the project:

      CMAKE_ASM_COMPILER
      CMAKE_ASM_FLAGS

  make: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.
  /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:1026:75: warning: unused parameter 'params' [-Wunused-parameter]
  static ggml_backend_t whisper_backend_init(const whisper_context_params & params) {
                                                                            ^
  /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:1620:27: warning: unused parameter 'mel_offset' [-Wunused-parameter]
                const int   mel_offset) {
                            ^
  /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:202:29: warning: unused function 'ggml_mul_mat_pad' [-Wunused-function]
  static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
                              ^
  3 warnings generated.
  thread 'main' panicked at /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/build.rs:295:9:
  No suitable tool equivalent to "objcopy" has been found in PATH, if one is already installed, either add its directory to PATH or set OBJCOPY_PATH to its full path. For your Operating System we recommend:
  "llvm-objcopy" from LLVM 17
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
       Error failed to build app: failed to build app
opeolluwa commented 6 months ago

You can also checkout https://docs.edgen.co

prabirshrestha commented 6 months ago

I can build but when I run completion api it crashes.

Assertion failed: (ne % ggml_blck_size(type) == 0), function ggml_row_size, file ggml.c, line 2126.
 ELIFECYCLE  Command failed.

Seems like I was able to run v0.1.2 but started to crash since v0.1.3.

opeolluwa commented 6 months ago

That's after the instructions above, correct?

prabirshrestha commented 6 months ago

yes that is after the instructions. Uninstall all rust toolchains too. Also probably wroth adding this line to the doc too in case you have multiple toolchains.

rustup override set beta-2023-11-21

One thing I did was add this to my profile after brew install llvm to get over that error.

export PATH="/opt/homebrew/opt/llvm/bin:$PATH"

If I remove it I will get the same error as you do.

opeolluwa commented 6 months ago

Lemme get this, the application now builds, that's after you've installed llvm and removed the existing rust tool chain

prabirshrestha commented 6 months ago

The application builds once I run the following commands. Rust toolchains didn't have much impact as I was able to build and run for other toolchains too. Just to be sure I removed all the rust toolchains and only had beta-2023-11-21.

brew install llvm
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"

I'm also able to run the edgen app and can see it in the taskbar and the window open. But as soon as I make the http://localhost:33322/v1/chat/completions request it crashes.

opeolluwa commented 6 months ago

Ok good! 👍 We're making some progress. Let's pickup again tomorrow, It's midnight my time