-
I am trying to run this through docker with a llama model (7B):
```
docker run --gpus all --shm-size 1g -p 8081:80 -v ./7B-transformed:/data/7B ghcr.io/yk/text-generation-inference:llama --model-i…
-
Hello, would it be possible to integrate something like GPT4All which runs locally and doesn't cost unlike OpenAI?
-
`llama.cpp` now supports new k-quants quantizations which achieve good model perplexity even in high quantizations. See https://github.com/ggerganov/llama.cpp/pull/1684 .
We should also support th…
-
I got a lot of error messages like this one:
Cannot find module 'llama-node/dist/llm/llama-rs' or its corresponding type declarations.
How can I install the types?
-
This may be something to keep an eye on: https://github.com/ggerganov/llama.cpp/pull/439
Looks like the corresponding code is here: https://github.com/rustformers/llama-rs/blob/bf7bdbcfff3114dcbdaf…
-
```
C:\Users\micro\Downloads\llamacord>cargo run --release
Finished release [optimized] target(s) in 0.16s
Running `target\release\llamacord.exe`
thread '' panicked at 'called `Result::un…
-
Model sucessfully runs on `llama.cpp` but not in `llama-rs`
Command:
```
cargo run --release -- -m C:\Users\Usuário\Downloads\LLaMA\7B\ggml-model-q4_0.bin -p "Tell me how cool the Rust programmin…
-
Trying any GPT-2 GGML model through the CLI appears to cause an immediate segfault:
```
llama-rs # cargo run --bin llm gpt2 infer -m models/gpt2/cerebras-2.7b-q4_0.bin -p "Now, this is a story all…
-
When using the compression feature, incremental builds where only the embedded files change do not trigger a rebuild. This is most obvious when using both compression and debug-embed.
I think this …
-
Support for Metal GPU acceleration on macOS (and I assume iOS) just merged in llama.cpp master: https://github.com/ggerganov/llama.cpp/pull/1642
It would be great if this could also be employed fro…