-
Repost from the [PyTorch forum](https://discuss.pytorch.org/t/flex-attention-gaps-in-profiler/211917/1)
I have recently been playing with Flex attention, trying to replace some of my custom triton …
-
I encountered an issue when attempting to trace a CrossEncoder model using torch.jit.trace. The error occurs during the tracing process when calling the forward method. Below is a minimal reproducible…
-
### 🐛 Describe the bug
This issue was raised from tracing Megatron/xlformers, where ```torch.distributed.all_reduce``` was called in backward of an ```autograd.Function``` and then it was rewritten b…
-
### 🐛 Describe the bug
After quantizing ResNet-18 model with PyTorch 2 Export Post Training Quantization it is not possible to export the model.
```python
import torch
from torchvision.model…
-
Test on commit: https://github.com/llvm/llvm-project/commit/6548b6354d1d990e1c98736f5e7c3de876bedc8e
steps to reproduce:
```
mlir-opt test.mlir --gpu-module-to-binary=format=%gpu_compilation_format…
-
### Problem Description
I install rDP and do tracing example follow the README.md. But it run Aborted(failed)
root@tw024:/ws/Try_rPD# runTracer.sh python matmult_gpu.py
Creating empty rpd: tra…
-
When I try to use `compile_model` with CUDA as the specified device, I encounter the following error. Is there a way to resolve this, or is the `lora.py` code not yet compatible with running on a GPU?…
-
### Summary
On the current `main` (commit 861fb7ef87bf9c20ee7a4c1632e3852681cc8ef4) - the single chip performance of UNet is approx. 329 fps. Running the same test except data parallel on N300 meas…
-
The issue customer faces with batch inferencing using this approach proposed in https://github.com/aws-neuron/aws-neuron-sdk/issues/906.
"My output is in Tuple[List[torch.Tensor]] which works well…
-
## Bug Description
indices are const tensor, which gets const folded into frozen param. The meta of the frozen param node is empty dict, leading to converter validation check failure [here](https:…