-
### Describe the bug
webui crashes after sending prompt
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Reproduction
1) `./start_linux.sh`
2) load a model
…
-
### What happened?
Hi there.
My llama-server can work well with the following command:
```bash
/llama.cpp-b3985/build_gpu/bin/llama-server -m ../artifact/models/Mistral-7B-Instruct-v0.3.Q4_1.g…
-
Ollama logs look awesome in Humanlog but can get a few improvements
![image](https://github.com/user-attachments/assets/eb731310-f80d-4df1-b287-8efb046ef410)
Logs attached: [ollama_serve_output…
-
### System Info
When initializing LlamaTokenizer from the Transformers library, the tokenizer is being recognized as a bool. This issue persists across different environments and Python versions.
…
-
### System Info
Environment:
OS: Ubuntu 24.04
Python version: 3.11.8
Transformers version: transformers==4.45.2
Torch version: torch==2.3.0
Model: Meta-Llama-3.1-70B-Q2_K-GGUF - https://hugg…
-
### Self Checks
- [X] This template is only for bug reports. For questions, please visit [Discussions](https://github.com/fishaudio/fish-speech/discussions).
- [X] I have thoroughly reviewed the proj…
-
I am follwing the [instructions in the Llama2 README](https://github.com/pytorch/executorch/blob/d9aeca556566104c2594ec482a673b9ec5b11390/examples/models/llama2/README.md#instructions) to test llama m…
-
I would expect ein to be able pick up my poetry jupyter kernel:
```python
> poetry run jupyter kernelspec list
Available kernels:
python3 /Users/dguim/Library/Caches/pypoetry/virtualen…
-
### Your current environment
- vLLM version: v0.5.3.post1 (Public Docker Image )
- Model: Llama 3 70 B
- Dtype: FP16
- GPU: Nvidia H100
### 🐛 Describe the bug
The vLLM metrics endpoint is showin…
-
When I quantified the Qwen2.5-1.5B-instruct model according to **"Quantizing the GGUF with AWQ Scale"** of [docs](https://qwen.readthedocs.io/en/latest/quantization/llama.cpp.html) , it showed that th…