-
## Overview
## Tasklist
- [ ] Can this be solved via llama.cpp? (e.g. compiled for Vulkan and ROCm)
- [x] https://github.com/janhq/cortex.llamacpp/issues/9
- [ ] [https://github.com/janhq/jan/issues…
-
### 🚀 The feature, motivation and pitch
Currently when using the Automatic Prefix Caching when you truncate the input (for chat related generation) because of the context limit. The Automatic Prefix …
-
**The bug**
Following the examples here https://lightning.ai/lightning-ai/studios/structured-llm-output-and-function-calling-with-guidance#llm-tool-use using tools with llama.cpp and Mistral 8B gguf,…
-
Kinda self explanatory from the title, right now each python version for a given target builds llama.cpp independently. This artificially limits how many platforms we can support by blowing up ci buil…
-
## Describe the bug
### My environment
Windows 11 Pro, Docker Desktop, WSL2 Ubuntu Engine, latest nvidia driver
### Cuda test
I made sure the Docker WSL2 Cuda implementation works correctly by…
-
What about custom/private LLMs. Will there be an option to use some of longchain local features like llama.cpp?
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
Hello, is it possible to compile with cmake?
Because with make it doesn't detect cuda
-
**The bug**
When using `models.LlamaCpp` the selected tokenizer is always gpt2 (This can be seen in the outut when `verbose=True` arg is set). I have pasted the dumped KV metadat keys
```
llama_mod…
-
I installed llamacpp using the instructions below:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
the speed:
llama_print_timings: eval time = 81.91 ms / 2 runs ( 40…