-
### Describe the bug
When passing a `response_format` of type `regex` to `chat_completion`, the output does not always respect the format.
### Reproduction
This does not follow the regex:
```
fro…
-
I ran the command like this:
```bash
bun x humanifyjs local responsez.js
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1070 (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
[nod…
-
### Bug Report
GPT4ALL was working well before the recent update. Today I update to v3.1.0. After that when I load a model it instead of loading the model.
### Steps to Reproduce
Open gpt…
-
### What is the issue?
I get a CUDA out of memory error when sending large prompt (about 20k+ tokens) to Phi-3 Mini 128k model on laptop with Nvidia A2000 4GB RAM. At first about 3.3GB GPU RAM and …
-
### Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue](ht…
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
### 🐛 Describe the bug
I deployed the vllm server using below…
-
Hi
So I was training a new tokenizer from Llama Tokenizer (meta-llama/Llama-2-7b-hf), on a medium sized corpus (Fineweb-10BT sample : 15 million documents with average length of 2300 characters). A…
-
![image](https://github.com/user-attachments/assets/1231e002-8c11-4251-bba2-1fb02a067007)
Hi!
I am fine-tuning LLaMA3 on the hh-rlhf dataset using SimPo and noticed that the reward/chosen rewar…
-
Any ideas, why it can be slow? For example I'm using KoboldCPP with the same Mistral model and it answers immediately in realtime, almost like ChatGPT (I have RTX 4090). It also starts in like 15 seco…
-
On adding llama_cpp-rs to my Cargo.toml, llama.cpp seems to be locked to an older version. I'm trying to use Phi-3 128k in a project and I'm unable to because the [PR that was merged into llama.cpp](h…