-
About your locally hosted example
```
from paperqa import Settings, ask
local_llm_config = dict(
model_list=[
dict(
model_name="my_llm_model",
litellm_pa…
-
It would be great to see OLMoE/OlmoeForCausalLM Llama.cpp/GGUF support.
Really neat project!
-
**Describe the package you'd like added**
`llama.cpp` has become a popular inference server for LLMs. Additionally, `llama-cpp-python` is commonly used to connect from Python to `llama.cpp`.
- `l…
-
Could we have support for [Llama.cpp?](https://github.com/ggerganov/llama.cpp)
That will make the model more accessible to many popular tools like Ollama, LM Studio, Koboldcpp, text-generation-webui,…
-
## Overview
We need to add support for using [llama.cpp](https://github.com/ggerganov/llama.cpp) as an inference server in our project. llama.cpp is known for its speed, cross-platform compatibility,…
-
Do llama.cpp support input_embeds? Just like `transformers` support `input_embeds` in `model.generate` function.
-
### System Info / 系統信息
ubuntu22.04 python3.11.8
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [ ] docker / docker
- [X] pip install / 通过 pip install 安装
- [ ] installation from …
-
- Quantize your fine-tuned Llama Model using [ggml-org/gguf-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo)
- Serve the model using [llama.cpp](https://github.com/ggerganov/llama.cpp)
-
### Describe the issue as clearly as possible:
When using `models.llamacpp` and creating JSON using a Pydantic model I get an error when generating the first result (see code to reproduce below). I h…
-
Traceback (most recent call last):
File "/data/zhy/models/llama_cpp_python/model_test.py", line 1, in
from llama_cpp import Llama
File "/data/zhy/models/llama_cpp_python/llama_cpp/__init__…