-
### 🐛 Describe the bug
The following code generates the compile error below:
```
import code
import time
import warnings
import numpy as np
import torch
from torch.nn.attention.flex_attent…
-
**Describe the bug/ 问题描述 (Mandatory / 必填)**
LoRA微调Qwen2.5-3B模型时,训练阶段前10个step的速度比较快,能达到1~2s/step,随后逐渐减慢到10s/step以上,并且GPU的利用率在前期能达到100%,但在100个step之后就长时间地停在2%。
- **Hardware Environment(`Ascend`/`GPU`…
-
### OS
iOS, iPadOS, macOS
### Description
In order to make Range Test more useful, the app should be displaying them in a graph/table like what is already done for device or environmental metrics. …
-
报错内容:
```
Traceback (most recent call last):
File "./graphgpt/eval/run_graphgpt.py", line 244, in
run_eval(args, args.num_gpus)
File "./graphgpt/eval/run_graphgpt.py", line 98, in run_ev…
-
**Description**
CUDA Graph not work in tensorrt backend. The model config as below:
```
platform: "tensorrt_plan"
version_policy: { latest: { num_versions: 2}}
parameters { key: "execution_mode"…
-
作者您好,这篇论文有参考代码嘛?
-
### 🐛 Describe the bug
With a 2D spatial neighborhood pattern, flash attention is orders of magnitude slower than dense attention:
hlc=2
seq_length : 192
flex attention : 0.0015106382369995117 […
-
env:
torch.__version__ = 2.0.1+cu118
onnx.__version__ = '1.16.0'
command:
python cosyvoice/bin/export_onnx.py --model_dir $dir
error logs:
/root/miniconda3/envs/cosyvoice/lib/python3.8/site-pac…
-
Run transformer block on device OpenCL, output layer on PTX:
```
python %TORNADO_SDK%\bin\tornado ^
--jvm="-Dtb.device=1:0 -Dol.device=2:0 -DUseVectorAPI=true -Dtornado.device.memory=2GB" ^
--clas…
-
## 🚀 Feature
[This paper in ICLR ](https://openreview.net/pdf?id=SJgxrLLKOE) describes a new attention mechanism for graph neural networks that builds off of the original multi-head attention for…
vymao updated
3 years ago