Atome-FE / llama-node

Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
https://llama-node.vercel.app/
Apache License 2.0
863 stars 62 forks source link

[ASK] enable cuda with manual compilation #34

Open tchereau opened 1 year ago

tchereau commented 1 year ago

Hi,

According to the llama.cpp github repo, it's now possible to use cuda from nvidia gpu by using cuBLAS build.

So you may got where I want to come, how to do Manual compilation with using make or cmake args to enable LLAMA_CUBLAS :)

greeting

hlhr202 commented 1 year ago

Hi, I plan to support this CUDA feature in the near future. Thanks for your suggestion.

hlhr202 commented 1 year ago

this issue is pending new llama sampling logic here https://github.com/Atome-FE/llama-node/issues/36

hlhr202 commented 1 year ago

currently this issue is pending statically linking problem from llama.cpp cmake https://github.com/ggerganov/llama.cpp/pull/1128#issuecomment-1531661524

tchereau commented 1 year ago

I can compile with this :

let command = command
        .arg("..")
        .arg("-DCMAKE_BUILD_TYPE=Release")
        .arg("-DLLAMA_OPENBLAS=ON")
        .arg("-DLLAMA_CUBLAS=ON")
        .arg("-DLLAMA_SHARED_LIBS=ON")
        .arg("-DLLAMA_STATIC=ON")
        .arg("-DLLAMA_ALL_WARNINGS=OFF")
        .arg("-DLLAMA_ALL_WARNINGS_3RD_PARTY=OFF")
        .arg("-DLLAMA_BUILD_TESTS=OFF")
        .arg("-DLLAMA_BUILD_EXAMPLES=OFF")
        .arg("-DCMAKE_POSITION_INDEPENDENT_CODE=ON");

but as a result, openblas is not activated, and cublas seems not be activated too, it's don't use my gpu

and worst, I've compiled llama.cpp from the github source, blas work fine, but clblast and cublas don't. it's put some data in the vram, but it's still don't use the gpu, only the cpu

So I think we should wait a little bit for an update from llama.cpp

hlhr202 commented 1 year ago

@tchereau great work. i was stuggling for fPIC error for several hours and only solved by dynamic linking. thanks for your investigation. but its true that cublas flag doesnt accelerate the evaluation properly. we have to wait a while.

hlhr202 commented 1 year ago

I can compile with this :

let command = command
        .arg("..")
        .arg("-DCMAKE_BUILD_TYPE=Release")
        .arg("-DLLAMA_OPENBLAS=ON")
        .arg("-DLLAMA_CUBLAS=ON")
        .arg("-DLLAMA_SHARED_LIBS=ON")
        .arg("-DLLAMA_STATIC=ON")
        .arg("-DLLAMA_ALL_WARNINGS=OFF")
        .arg("-DLLAMA_ALL_WARNINGS_3RD_PARTY=OFF")
        .arg("-DLLAMA_BUILD_TESTS=OFF")
        .arg("-DLLAMA_BUILD_EXAMPLES=OFF")
        .arg("-DCMAKE_POSITION_INDEPENDENT_CODE=ON");

but as a result, openblas is not activated, and cublas seems not be activated too, it's don't use my gpu

and worst, I've compiled llama.cpp from the github source, blas work fine, but clblast and cublas don't. it's put some data in the vram, but it's still don't use the gpu, only the cpu

So I think we should wait a little bit for an update from llama.cpp

Hi, I think you can compile with this, but can you run it actually? I got an error related to cudaLaunchKernel. I think there should be something abnormal with static linking. As a consequence, I m going to just provide a way for dynamic linking, but will not offer it as default prebuilt binary.

tchereau commented 1 year ago

I can compile with this :

let command = command
        .arg("..")
        .arg("-DCMAKE_BUILD_TYPE=Release")
        .arg("-DLLAMA_OPENBLAS=ON")
        .arg("-DLLAMA_CUBLAS=ON")
        .arg("-DLLAMA_SHARED_LIBS=ON")
        .arg("-DLLAMA_STATIC=ON")
        .arg("-DLLAMA_ALL_WARNINGS=OFF")
        .arg("-DLLAMA_ALL_WARNINGS_3RD_PARTY=OFF")
        .arg("-DLLAMA_BUILD_TESTS=OFF")
        .arg("-DLLAMA_BUILD_EXAMPLES=OFF")
        .arg("-DCMAKE_POSITION_INDEPENDENT_CODE=ON");

but as a result, openblas is not activated, and cublas seems not be activated too, it's don't use my gpu and worst, I've compiled llama.cpp from the github source, blas work fine, but clblast and cublas don't. it's put some data in the vram, but it's still don't use the gpu, only the cpu So I think we should wait a little bit for an update from llama.cpp

Hi, I think you can compile with this, but can you run it actually? I got an error related to cudaLaunchKernel. I think there should be something abnormal with static linking. As a consequence, I m going to just provide a way for dynamic linking, but will not offer it as default prebuilt binary.

well yes I can compile, but can't run it

pnpm build:llama-cpp

> llama-node@0.0.34 build:llama-cpp /root/git/llama-node
> pnpm run --filter=@llama-node/llama-cpp cross-compile

> @llama-node/llama-cpp@0.0.34 cross-compile /root/git/llama-node/packages/llama-cpp
> rimraf @llama-node && tsx scripts/cross-compile.mts

info: component 'rust-std' for target 'x86_64-unknown-linux-gnu' is up to date
info: component 'rust-std' for target 'x86_64-unknown-linux-musl' is up to date
warning: /root/git/llama-node/Cargo.toml: unused manifest key: workspace.package.name
/bin/sh: 1: zig: not found
warning: /root/git/llama-node/Cargo.toml: unused manifest key: workspace.package.name
    Blocking waiting for file lock on build directory
   Compiling llama-sys v0.0.1 (/root/git/llama-node/packages/llama-cpp/llama-sys)
   Compiling llama-node-cpp v0.1.0 (/root/git/llama-node/packages/llama-cpp)
warning: value assigned to `id` is never read
   --> packages/llama-cpp/src/context.rs:189:17
    |
189 |         let mut id = 0;
    |                 ^^
    |
    = help: maybe it is overwritten before being read?
    = note: `#[warn(unused_assignments)]` on by default

warning: `llama-node-cpp` (lib) generated 1 warning
    Finished release [optimized] target(s) in 55.77s
   Compiling llama-sys v0.0.1 (/root/git/llama-node/packages/llama-cpp/llama-sys)
   Compiling llama-node-cpp v0.1.0 (/root/git/llama-node/packages/llama-cpp)
warning: value assigned to `id` is never read
   --> packages/llama-cpp/src/context.rs:189:17
    |
189 |         let mut id = 0;
    |                 ^^
    |
    = help: maybe it is overwritten before being read?
    = note: `#[warn(unused_assignments)]` on by default

warning: `llama-node-cpp` (lib) generated 1 warning
    Finished release [optimized] target(s) in 1m 34s

nodejs:

node .
node:internal/modules/cjs/loader:1338
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^

Error: /root/git/llama-selfbot/node_modules/llama-node/node_modules/@llama-node/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node: undefined symbol: cublasSetMathMode
    at Module._extensions..node (node:internal/modules/cjs/loader:1338:18)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12)
    at Module.require (node:internal/modules/cjs/loader:1141:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at Object.<anonymous> (/root/git/llama-selfbot/node_modules/llama-node/node_modules/@llama-node/llama-cpp/index.js:188:31)
    at Module._compile (node:internal/modules/cjs/loader:1254:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12) {
  code: 'ERR_DLOPEN_FAILED'
}

Node.js v18.16.0
tchereau commented 1 year ago

and with clblast:

pnpm build:llama-cpp

> llama-node@0.0.34 build:llama-cpp /root/git/llama-node
> pnpm run --filter=@llama-node/llama-cpp cross-compile

> @llama-node/llama-cpp@0.0.34 cross-compile /root/git/llama-node/packages/llama-cpp
> rimraf @llama-node && tsx scripts/cross-compile.mts

info: component 'rust-std' for target 'x86_64-unknown-linux-gnu' is up to date
info: component 'rust-std' for target 'x86_64-unknown-linux-musl' is up to date
/bin/sh: 1: zig: not found
warning: /root/git/llama-node/Cargo.toml: unused manifest key: workspace.package.name
warning: /root/git/llama-node/Cargo.toml: unused manifest key: workspace.package.name
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on build directory
   Compiling llama-sys v0.0.1 (/root/git/llama-node/packages/llama-cpp/llama-sys)
   Compiling llama-node-cpp v0.1.0 (/root/git/llama-node/packages/llama-cpp)
warning: value assigned to `id` is never read
   --> packages/llama-cpp/src/context.rs:189:17
    |
189 |         let mut id = 0;
    |                 ^^
    |
    = help: maybe it is overwritten before being read?
    = note: `#[warn(unused_assignments)]` on by default

warning: `llama-node-cpp` (lib) generated 1 warning
    Finished release [optimized] target(s) in 49.25s
   Compiling llama-sys v0.0.1 (/root/git/llama-node/packages/llama-cpp/llama-sys)
   Compiling llama-node-cpp v0.1.0 (/root/git/llama-node/packages/llama-cpp)
warning: value assigned to `id` is never read
   --> packages/llama-cpp/src/context.rs:189:17
    |
189 |         let mut id = 0;
    |                 ^^
    |
    = help: maybe it is overwritten before being read?
    = note: `#[warn(unused_assignments)]` on by default

warning: `llama-node-cpp` (lib) generated 1 warning
    Finished release [optimized] target(s) in 1m 20s

nodejs:

node .
node:internal/modules/cjs/loader:1338
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^

Error: /root/git/llama-selfbot/node_modules/llama-node/node_modules/@llama-node/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node: undefined symbol: clBuildProgram
    at Module._extensions..node (node:internal/modules/cjs/loader:1338:18)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12)
    at Module.require (node:internal/modules/cjs/loader:1141:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at Object.<anonymous> (/root/git/llama-selfbot/node_modules/llama-node/node_modules/@llama-node/llama-cpp/index.js:188:31)
    at Module._compile (node:internal/modules/cjs/loader:1254:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12) {
  code: 'ERR_DLOPEN_FAILED'
}

Node.js v18.16.0

same with .arg("-DLLAMA_OPENBLAS=ON") only

undefined symbol: cblas_sgemm

in all cases, the problem seem to be library who are not linked

at this point, I can't help, because I don't have the knowledge in C/rust

hlhr202 commented 1 year ago

oops, will reopen and leave this until static linking is ready someday... currently only provide a self built dynamic linking version.

https://github.com/Atome-FE/llama-node/pull/42/files here is the customizable build features for cargo. I will prepare another docs for this.

hlhr202 commented 1 year ago

https://llama-node.vercel.app/docs/cuda a manual compilation guide has been provided here