-
### What happened?
I am attempting to measure the perplexity of the gemma-2-9b-it-Q4_K_M.gguf model using llama.cpp. However, I encounter an issue where the process gets stuck at the "tokenizing th…
-
### Feature Request
Gemma2 support
👉👉👉[我的哔哩哔哩频道](https://space.bilibili.com/3493277319825652)
👉👉👉[我的YouTube频道](https://www.youtube.com/@AIsuperdomain)
### Motivation
Gemma2 support
👉👉👉[我的哔哩哔哩频…
win4r updated
2 months ago
-
Hi @hadley, thanks for sharing this, really exciting.
Very nice to see support for open models via ollama. I wonder if you would consider adding support for VLLM-hosted models as well, e.g. see ht…
-
### What happened?
When using llama.cpp models (e.g., granite-code and llama3) with Nvidia GPU acceleration (nvidia/cuda:12.6.1-devel-ubi9 and RTX 3080 10GB VRAM), the models occasionally return nons…
-
### Feature request
add gemma2
### Motivation
_No response_
### Other
_No response_
-
### What is the issue?
Hi,
Error: cudaMalloc failed: out of memory
### OS
Windows
### GPU
Nvidia
### CPU
Intel
### Ollama version
0.3.8
-
Why is initialization successful when running on Android using gemma2-2b-it model compression, but shows org.apache.tvm.Base$TVMError: TVMError: Assert fail: rotary_mode_code == 0, gemma2_q4f16_1_ bat…
-
### What happened?
The `lm_head` layer for a [Gemma2](https://huggingface.co/google/gemma-2-2b) LoRA adapter is not converted by `convert_lora_to_gguf.py`, and therefore not applied at inference (r…
-
### System Info / 系統信息
cuda: 12.6
transformer: 4.44.0
OS: win10
python: 3.11.4
ollama: 0.3.8 & 0.2.3
配置: RTX3090 12700kf
### Who can help? / 谁可以帮助到您?
_No response_
### Information / 问…
-
### Description
With certain language models, the output contains extra Markdown code block notations (```).
### Environment
- **Operating System:** Fedora Workstation 40
- **Node.js Version:** …