Open tchereau opened 1 year ago
emmm, maybe your llama-node dist is still in old version
could you try to run pnpm build
under root folder?
Same issue here, attempting to use 0.1.15 of both llama-node and llama-cpp with ggml q5_1 versions of different models, it looks like Llama is not properly initialized.
Loading LLamaCpp
project/ggml-vic13b-uncensored-q5_1.bin {
nGpuLayers: 1,
nCtx: 2048,
nParts: -1,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true
} true
LLamaCpp {}
llama.cpp: loading model from project/ggml-vic13b-uncensored-q5_1.bin
TypeError: Cannot read properties of undefined (reading 'inference')
at project/node_modules/llama-node/dist/llm/llama-cpp.js:75:39
at new Promise (<anonymous>)
at LLamaCpp.<anonymous> (project/node_modules/llama-node/dist/llm/llama-cpp.js:73:14)
at Generator.next (<anonymous>)
at project/node_modules/llama-node/dist/llm/llama-cpp.js:33:61
at new Promise (<anonymous>)
at __async (project/node_modules/llama-node/dist/llm/llama-cpp.js:17:10)
at LLamaCpp.createCompletion (project/node_modules/llama-node/dist/llm/llama-cpp.js:69:12)
at LLM.<anonymous> (project/node_modules/llama-node/dist/index.cjs:56:23)
at Generator.next (<anonymous>)
llama_model_load_internal: format = ggjt v2 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 9 (mostly Q5_1)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 90.75 KB
llama_model_load_internal: mem required = 11359.05 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size = 3200.00 MB
INFO - AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
INFO - tokenized_stop_prompt: None
{ tokens: [ '', '\n\n<end>\n' ], completed: true }
This is probably caused by synchronously calling llama.load(config);
.
Properly awaiting this async generator solves the problem.
since I compiled for using cuda core first I had to add
nGpuLayers
(seem logic as it's an option available in llama.cpp)then I obtain this error:
did I missed something or did something wrong ?