Atome-FE / llama-node

Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
https://llama-node.vercel.app/
Apache License 2.0
865 stars 63 forks source link

GPU version build not using GPU #114

Open dspasyuk opened 1 year ago

dspasyuk commented 1 year ago

Hi Everyone,

I am trying to build llama-node for GPU, I followed the guide in the readme https://llama-node.vercel.app/docs/cuda but the version of the llam-cpp I get from a manual build uses CPU not GPU. When I build llama-cpp directly in llama-sys folder using the following command:

make clean && LLAMA_CUBLAS=1 make -j It gives me perfectly fine GPU executable file which works no problem.

Am I missing something? Here is my full build commands:

git clone https://github.com/Atome-FE/llama-node.git cd llama-node/ rustup target add x86_64-unknown-linux-musl git submodule update --init --recursive pnpm install --ignore-scripts cd packages/llama-cpp/ pnpm build:cuda

Then I get libllama.so file in my ~/.llama-node which when used does not use GPU: Here my script to run it:

import { LLM } from "llama-node"; import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js"; import path from "path"; const model = path.resolve(process.cwd(), "~/CODE/models/vicuna-7b-v1.3.ggmlv3.q4_0.bin"); const llama = new LLM(LLamaCpp); const config = { modelPath: model, enableLogging: true, nCtx: 1024, seed: 0, f16Kv: false, logitsAll: false, vocabOnly: false, useMlock: false, embedding: false, useMmap: true, nGpuLayers: 40 }; const template = How do I train you to read my documents?; const prompt = A chat between a user and an assistant. USER: ${template} ASSISTANT:; const params = { nThreads: 4, nTokPredict: 2048, topK: 40, topP: 0.1, temp: 0.2, repeatPenalty: 1, prompt, }; const run = async () => { await llama.load(config); await llama.createCompletion(params, (response) => { process.stdout.write(response.token); }); }; run();

Any help appreciated

shaileshminsnapsys commented 1 year ago

I am facing the same issue. Can anyone please guide us on this?

dspasyuk commented 1 year ago

I ended up using just llama.cpp. Works very well on the GPU. You can write a simple wrapper in nodejs without rust. I can share the code if you want.

shaileshminsnapsys commented 1 year ago

@deonis1 it will be a great help. Please share the code.

dspasyuk commented 1 year ago

@shaileshminsnapsys no problem the code is here https://github.com/deonis1/llcui

shaileshminsnapsys commented 1 year ago

Thank you @deonis1 , I'll check with the code.

Thank you for your help.

dspasyuk commented 1 year ago

Let me know if you have any issues

shaileshminsnapsys commented 1 year ago

@deonis1 Thank you so much, your code help me alot to achieve my target.

Many Thanks !!

dspasyuk commented 1 year ago

@shaileshminsnapsys no problem, there is a new version if you are interested

shaileshminsnapsys commented 1 year ago

@deonis1 would love to see the new version. Thank you

dspasyuk commented 1 year ago

@shaileshminsnapsys The new version that supports embedding (mongodb or text document) is released. You can find it under the new url: https://github.com/deonis1/llama.cui

shaileshminsnapsys commented 1 year ago

@deonis1 Wow, its amazing.. Thanks, I'll give a try to it for sure.