-
Hello, I met the following error when running `blend.py`
I changed the model to `Llama-3-8B-Instruct` since I have no access to mixtral models. Will that cause error?
Log:
```
$ python example…
-
Release date: Aug 8 2024
Branch cut: Aug 2 2024
## [Developer Facing API](https://github.com/pytorch/ao/issues/391)
- [x] static quantization flow example @jerryzh168
- [ ] QAT refactor to gener…
-
### Your current environment
H100 40GB
### Model Input Dumps
_No response_
### 🐛 Describe the bug
```
docker run -d --restart=always \
--runtime=nvidia \
--gpus '"device=MIG-2ea01c20-8…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
WARNING 09-23 09:07:16 _custom_ops.py:18] Failed to import from vllm._C with …
-
I followed the instructions from https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html on a bare metal server from the Intel Dev Cloud, specifically this instance:
…
-
Hi,
I'm testing llama3-70b model with smoothquant on a 4 x RTX-4090 GPUs node. Due to the memory restriction, I used `host_cache_size` parameter for offloading kv cache to host. Then I hit 2 issues:…
ljayx updated
1 month ago
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
### Your current environment
Running in Kubernetes on H100 in vllm/vllm-openai:v0.4.0
### 🐛 Describe the bug
Seems like there have been some weird dependency issues since v0.2.7. We would love to u…
-
### What happened?
Offloading 31 layers out of the 33 with an 8b model produces correct results, with 32 layers, the response is incoherent.
33 or more offloaded layers cause the instruction to be…
8XXD8 updated
2 months ago
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch…
wlwqq updated
1 month ago