Atome-FE / llama-node

Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
https://llama-node.vercel.app/
Apache License 2.0
851 stars 62 forks source link

[ERROR] cublas `TypeError: this.instance.inference is not a function` #73

Open tchereau opened 1 year ago

tchereau commented 1 year ago

since I compiled for using cuda core first I had to add nGpuLayers (seem logic as it's an option available in llama.cpp)

then I obtain this error:

TypeError: this.instance.inference is not a function                                                    │
    at file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-cpp.js:54:23               │
    at new Promise (<anonymous>)                                                                        │
    at LLamaCpp.<anonymous> (file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-cpp.j│
s:53:14)                                                                                                │
    at Generator.next (<anonymous>)                                                                     │
    at file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-cpp.js:33:61               │
    at new Promise (<anonymous>)                                                                        │
    at __async (file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-cpp.js:17:10)     │
    at LLamaCpp.createCompletion (file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-│
cpp.js:50:12)                                                                                           │
    at LLM.<anonymous> (/root/git/llama-selfbot/node_modules/llama-node/dist/index.cjs:56:23)           │
    at Generator.next (<anonymous>)

did I missed something or did something wrong ?

hlhr202 commented 1 year ago

emmm, maybe your llama-node dist is still in old version could you try to run pnpm build under root folder?

TekSiDoT commented 1 year ago

Same issue here, attempting to use 0.1.15 of both llama-node and llama-cpp with ggml q5_1 versions of different models, it looks like Llama is not properly initialized.

Loading LLamaCpp
project/ggml-vic13b-uncensored-q5_1.bin {
  nGpuLayers: 1,
  nCtx: 2048,
  nParts: -1,
  seed: 0,
  f16Kv: false,
  logitsAll: false,
  vocabOnly: false,
  useMlock: false,
  embedding: false,
  useMmap: true
} true
LLamaCpp {}
llama.cpp: loading model from project/ggml-vic13b-uncensored-q5_1.bin
TypeError: Cannot read properties of undefined (reading 'inference')
    at project/node_modules/llama-node/dist/llm/llama-cpp.js:75:39
    at new Promise (<anonymous>)
    at LLamaCpp.<anonymous> (project/node_modules/llama-node/dist/llm/llama-cpp.js:73:14)
    at Generator.next (<anonymous>)
    at project/node_modules/llama-node/dist/llm/llama-cpp.js:33:61
    at new Promise (<anonymous>)
    at __async (project/node_modules/llama-node/dist/llm/llama-cpp.js:17:10)
    at LLamaCpp.createCompletion (project/node_modules/llama-node/dist/llm/llama-cpp.js:69:12)
    at LLM.<anonymous> (project/node_modules/llama-node/dist/index.cjs:56:23)
    at Generator.next (<anonymous>)
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  90.75 KB
llama_model_load_internal: mem required  = 11359.05 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size  = 3200.00 MB
INFO - AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
INFO - tokenized_stop_prompt: None
{ tokens: [ '', '\n\n<end>\n' ], completed: true }
TekSiDoT commented 1 year ago

This is probably caused by synchronously calling llama.load(config);.

Properly awaiting this async generator solves the problem.