Open dspasyuk opened 1 year ago
I am facing the same issue. Can anyone please guide us on this?
I ended up using just llama.cpp. Works very well on the GPU. You can write a simple wrapper in nodejs without rust. I can share the code if you want.
@deonis1 it will be a great help. Please share the code.
@shaileshminsnapsys no problem the code is here https://github.com/deonis1/llcui
Thank you @deonis1 , I'll check with the code.
Thank you for your help.
Let me know if you have any issues
@deonis1 Thank you so much, your code help me alot to achieve my target.
Many Thanks !!
@shaileshminsnapsys no problem, there is a new version if you are interested
@deonis1 would love to see the new version. Thank you
@shaileshminsnapsys The new version that supports embedding (mongodb or text document) is released. You can find it under the new url: https://github.com/deonis1/llama.cui
@deonis1 Wow, its amazing.. Thanks, I'll give a try to it for sure.
Hi Everyone,
I am trying to build llama-node for GPU, I followed the guide in the readme https://llama-node.vercel.app/docs/cuda but the version of the llam-cpp I get from a manual build uses CPU not GPU. When I build llama-cpp directly in llama-sys folder using the following command:
make clean && LLAMA_CUBLAS=1 make -j It gives me perfectly fine GPU executable file which works no problem.
Am I missing something? Here is my full build commands:
git clone https://github.com/Atome-FE/llama-node.git cd llama-node/ rustup target add x86_64-unknown-linux-musl git submodule update --init --recursive pnpm install --ignore-scripts cd packages/llama-cpp/ pnpm build:cuda
Then I get libllama.so file in my ~/.llama-node which when used does not use GPU: Here my script to run it:
import { LLM } from "llama-node"; import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js"; import path from "path"; const model = path.resolve(process.cwd(), "~/CODE/models/vicuna-7b-v1.3.ggmlv3.q4_0.bin"); const llama = new LLM(LLamaCpp); const config = { modelPath: model, enableLogging: true, nCtx: 1024, seed: 0, f16Kv: false, logitsAll: false, vocabOnly: false, useMlock: false, embedding: false, useMmap: true, nGpuLayers: 40 }; const template =
How do I train you to read my documents?
; const prompt =A chat between a user and an assistant. USER: ${template} ASSISTANT:
; const params = { nThreads: 4, nTokPredict: 2048, topK: 40, topP: 0.1, temp: 0.2, repeatPenalty: 1, prompt, }; const run = async () => { await llama.load(config); await llama.createCompletion(params, (response) => { process.stdout.write(response.token); }); }; run();Any help appreciated