-
**Is your feature request related to a problem? Please describe.**
The current implementation of the title generation functionality in Open WebUI during chat initiation does not maintain context leng…
-
### Bug description
If you have a model with name in any other than english language you get filename '_{id}' which will not be exported
it is obvious from this code
def get_filename(model_name: …
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
Hi, I am trying to fine-tune a linear adapter for the `bge-m3` embedding.
To achi…
-
Hey, thanks for sharing such a great tool
I might be missing something, but when I'm chatting with a Llama 3 model (either the original or a variant like dolphin 2.9) the context length seems maxxe…
-
### 🚀 The feature
Allow to load on server startup `.tar.gz` model archives using property `load_models=all`, besides `.mar` and `.model` files
### Motivation, pitch
I am currently using Torch…
-
### Description
Tested GraphRAG with OpenAI.
In my understanding, env variables `GRAPHRAG_LLM_MODEL` and `GRAPHRAG_EMBEDDING_MODEL` are used for creating GraphRAG index. So, I set .env like thi…
-
### Issue description
There is an error when generating response, which looks like Vulkan related issue. But ollama run the same model , which works well, Thanks for your time! Best Regarded!
### Ex…
-
### Describe the bug
Crash with abort when trying to use AMD graphics card in editor
Model is mistral-7b-instruct-v0.2.Q4_K_M.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX…
-
- [ ] [hsiehjackson/RULER: This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?](https://github.com/hsiehjackson/RULER?tab=readme-ov-file)
…
-
### What happened?
I'm using the `openai` library to interact with `llama-server` docker image on an A6000:
`docker run -p 8080:8080 --name llama-server -v ~/gguf_models:/models --gpus all ghcr.io…