-
Hey all, thanks for your work. It seems there's an official gguf release for the 1.0 version of the Lllama-based model, but not of the 1.1 version. Is that because llama-cpp changes would be required?…
ghost updated
1 month ago
-
Hello,
I have fine-tuned a Llama 3 model and now I would love to use it on a CPU. I tried to use `device_map = 'cpu'` when loading the model.
However, I am still encountering CUDA issues such as
…
-
When reading the [API Docs](https://github.com/ollama/ollama/blob/main/docs/api.md#request-7) many options are listed with no visible explanation for what they do. The only explanation I could find [h…
-
Tried following the wiki
https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/wiki/Unlock-LM-Studio-on-Any-AMD-GPU-with-ROCm-Guide#using-amd-graphics-cards-with-lm-studio
Copied the fi…
-
When running the x86 model_service image you face this error
```
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.9/site-packages/llama_cpp/llama_cpp.py", line 74, in _load…
-
## Overview
## Tasklist
- [ ] Can this be solved via llama.cpp? (e.g. compiled for Vulkan and ROCm)
- [x] https://github.com/janhq/cortex.llamacpp/issues/9
- [ ] [https://github.com/janhq/jan/issues…
-
### System Info / 系統信息
Ubuntu 20.04
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [X] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码…
-
**Issue identified:** cuDNN SDPA JIT recompiles when the context length changes. This results in training that does not use packing to keep recompiling, resulting in the observed 500ms overhead.
--…
-
I found that in the benchmark/suite has the output time to first token. However, when I run `python benchmark.py --model meta-llama/Llama-2-7b-hf static --isl 128 --osl 128 --batch 1` an error occurs:…
-
Apple released several open source LLMs that are designed to run on-device.
[Huggingface Link](https://huggingface.co/apple/OpenELM)