-
File "/home/qx/.local/lib/python3.10/site-packages/awq/models/base.py", line 231, in quantize
self.quantizer.quantize()
File "/home/qx/.local/lib/python3.10/site-packages/awq/quantize/quantize…
-
### Your current environment
Below is my current Docker Compose configuration:
```yaml
services:
vllm:
image: vllm/vllm-openai:v0.6.4
deploy:
resources:
reservation…
-
As we found out in tenstorrent/pytorch2.0_ttnn#198, several ops produce (1, N) tensors when (N,) tensors are expected.
Affected ops:
- `ceil`
- `floor`
- `gelu`
- `rsqrt`
- `sqrt`
Spared op…
-
When I want to reproduce the original model results of mistral-7b-v0.2 without `flash-attn` I got the error:
```
Traceback (most recent call last):
File "/home/yuanye/long_llm/InfLLM/benchmark/pr…
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/eosphoros-ai/DB-GPT/issues?q=is%3Aissue) and found no similar issues.
### Operating system information
MacOS(M1, M2…
-
### Your current environment
```text
k8s 1.31 using vllm-openai:latest
```
### How would you like to use vllm
I am currently running the QWEN model with 1 GPU with the below manifest
`…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussi…
-
### Reminder
- [x] I have read the README and searched the existing issues.
### System Info
我想要在代码中添加可训练参数,比如定义一个缩放参数scale_param,该参数用于在前向传播中暂时缩放模型参数(不合理但这里仅用作举例)
我目前的做法是,在hiyouga/LLaMA-Factory/src…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### YOLOv8 Component
_No response_
…
-
### 🚀 The feature, motivation and pitch
It is common to have a scenario where folks want to deploy multiple vLLM instances on a single machine due to the machine have several GPUs (commonly 8 GPUs). …