-
Running in WSL, all deps satisified, most recent code pull, on a RTX 3090.
Command line:
`./build/bin/main -m models/7B/llama-7b-relu.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" --vram-budg…
-
chatglm3-6b -t q4_0 -o chatglm3-ggml.bin
![1](https://github.com/li-plus/chatglm.cpp/assets/162705053/b0b680aa-8ab6-41ea-bbc6-a7e0779c6108)
-
Thanks for your tool.
I have a problem when I run the TheBloke/Llama-2-70b-Chat-GGUF model.
It loads well. But After I asked questions, it craped. Is it normal? I have dual 4090.
The error code…
-
I've successfully use my RTX 3080ti with Stable Diffusion, Fooocus, Stable Cascade and my system is ready to work with GPU.
Arch Linux.
```
$ ./run.sh --model 7b --with-cuda
...
✔ Container llam…
-
The readme instructs the user to download stable-vicuna-13B.ggml.q4_2.bin from a linked repo. That file does not appear in the repo.
-
I've been trying to cross compile whisper.cpp to the `powerpc64le-linux-gnu` platform, with the Cmake build files, and GCC 8.1.0, and getting the following error. Any tips or ideas? Is there a minimum…
-
### What happened?
I am not able to build llama.cpp for `HIPBLAS` using CMake, where make works.
```
$ echo "$(hipconfig -l)/clang"
/opt/rocm/lib/llvm/bin/clang
```
```
$ echo "$(hipconfi…
-
When I create `conda` environment using these steps:
```
conda create --name gguf-to-torch python=3.12 -y
conda activate gguf-to-torch
conda install pytorch torchvision torchaudio pytorch-cuda…
-
Right now Llamero requires that you add the prompt template details yourself.
Not a big deal, but those details are already present in the model. So, using the GGUF spec, we should add support to r…
-
otherarch/ggml_v3-cuda.cu(609): warning #177-D: function "warp_reduce_sum(half2)" was declared but never referenced
otherarch/ggml_v3-cuda.cu(630): warning #177-D: function "warp_reduce_max(half2)"…