-
### 🐛 Describe the bug
I'm compiling a graph multiple times using inductor. I find it inplace modify the graph, and one of the graph's output changes from tensor to list of tensor.
example code:
…
-
### 🐛 Describe the bug
As mentioned in this [blog](https://dev-discuss.pytorch.org/t/higher-order-operators-2023-10/1565), HigherOrderOperator does not support graph break inside the input/output fun…
-
The possibility of inserting a watermark into the widget is very interesting, especially when we need to draw the user's attention to a piece of information.
In the example below, we have a sensor th…
-
### 🐛 Describe the bug
The model is converted with dynamo and opset 18, torch nightly, and most recent onnx and onnxscript,
# PyTorch ONNX Conversion Report
```
✅ Obtain model graph with `torch…
-
# Summary
We recently landed support for grouped query attention via use `enable_gqa` on sdpa, however this is only enabled on the flash attention backend. This leads to a weird situation where it c…
-
i tried exporting the stream conformer model to onnx format with below parameters.
```
python3 wenet/bin/export_onnx_gpu.py --config=$model_dir/train.yaml --checkpoint=$model_dir/final.pt --cmvn_fi…
-
## Summary
Repro script
``` Python
import torch
import torch.nn as nn
import torch.nn.functional as F
q = torch.randn(1, 16, 1, 64, device="cuda", dtype=torch.bfloat16, requires_grad=True)…
-
Hi, I'm working on the attention mechanism for face recognition models, I'm using the ir model as a backbone, but I don't know much about the details of the implementation of grad-cam, what exactly sh…
-
Hi, thanks for creating this package, it helps us to run whisper with tensorRT.
however, we found that is package didn't include a dependency map (usually is done by requirements.txt)
so we run wh…
-
From my understanding, flex attention (using `block_mask`) gets faster when the number of empty blocks is larger. If the inputs (Q, K, V) do not represent sequences, but graphs with local connectivity…