GPU version build not using GPU

dspasyuk commented 1 year ago

Hi Everyone,

I am trying to build llama-node for GPU, I followed the guide in the readme https://llama-node.vercel.app/docs/cuda but the version of the llam-cpp I get from a manual build uses CPU not GPU. When I build llama-cpp directly in llama-sys folder using the following command:

make clean && LLAMA_CUBLAS=1 make -j It gives me perfectly fine GPU executable file which works no problem.

Am I missing something? Here is my full build commands:

git clone https://github.com/Atome-FE/llama-node.git cd llama-node/ rustup target add x86_64-unknown-linux-musl git submodule update --init --recursive pnpm install --ignore-scripts cd packages/llama-cpp/ pnpm build:cuda

Then I get libllama.so file in my ~/.llama-node which when used does not use GPU: Here my script to run it:

import { LLM } from "llama-node"; import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js"; import path from "path"; const model = path.resolve(process.cwd(), "~/CODE/models/vicuna-7b-v1.3.ggmlv3.q4_0.bin"); const llama = new LLM(LLamaCpp); const config = { modelPath: model, enableLogging: true, nCtx: 1024, seed: 0, f16Kv: false, logitsAll: false, vocabOnly: false, useMlock: false, embedding: false, useMmap: true, nGpuLayers: 40 }; const template = How do I train you to read my documents?; const prompt = A chat between a user and an assistant. USER: ${template} ASSISTANT:; const params = { nThreads: 4, nTokPredict: 2048, topK: 40, topP: 0.1, temp: 0.2, repeatPenalty: 1, prompt, }; const run = async () => { await llama.load(config); await llama.createCompletion(params, (response) => { process.stdout.write(response.token); }); }; run();

Any help appreciated