-
When trying to run the chat example I am getting an error.
ruby chat.rb --model /playingwithai/models/llama-2-7b-chat.Q8_0.gguf
llama_model_loader: loaded meta data with 19 key-value pairs and 291…
-
Following the instructions on the readme, I get all the way to running the model. Then `./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"` gives me this output:
main: seed = …
-
Tried using with several ggml bins (some from TheBloke, some from GPT4All) and it seems like this code only works with models not labeled as 4bit? Or am I missing something?
-
I have a 3090 GPU, and converted the falcon-40b-instruct and quantized by Q3_K. But when I run the test, prediction is 3x slower than the reported, then I check the gpu and cpu uage, but GPU utils is …
-
I'm trying to implement batched BERT inference based on the https://github.com/skeskinen/bert.cpp project. I'm running into the following assert error:
https://github.com/ggerganov/ggml/blob/3dd91c…
-
I'm noticing with v0.3.2 my CPU is getting slaughtered. The UI revamp is worse than the previous iteration with GPU offload now hidden on "My Models" page but even with all the layers assigned to GPU …
-
Is there anyway to run it in 4G or less vram?
ggml? or gptq?
-
I am trying to execute the following script:
1. from llama_cpp import Llama
2. llm = Llama(model_path="~/llama-2-7b.ggmlv3.q8_0.bin", n_gqa=8)
3. output = llm("Q: Name the planets in the solar sy…
-
Hello!
I built the libwhisper.so and libggml.so under Linux with `make libwhisper.so`
I have a SpringBoot application and put the native libs into src/main/resources/lib and set the System property …
-
Using the command `$ CC="/opt/rocm/llvm/bin/clang" CXX="/opt/rocm/llvm/bin/clang++" CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers` I am unable to compile ctransformers for ROCM. I'v…