-
2024-07-31 20:15:45.650643419 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on perform…
-
## Improper use of CUDA Graph in TC-GNN
Hello,
I wanted to bring to your attention a potential issue regarding the usage of CUDA Graph in TC-GNN. Upon reviewing the torch [document](https://pyto…
-
I followed the instructions carefully but when I click the button "Live", just an orange windows appears:
Did the same on my notebook and it worked fine (but with 2-3 FPS haha).
So I installed i…
-
Hi. I just wondering why I choose cuda graph as false, then I compared without stable_fast node.. then the last one is faster??
this is cuda graph as false:
![image](https://github.com/gameltb/…
-
Hello @yzh119,
Currently, we are using two independent API calls for prefill and decode in a mixed batch setting. This makes defining a cuda graph layout considerably harder. Ideally, if we could d…
-
### 🐛 Describe the bug
Hi there,
We're getting unknown CUDA graph errors with PyTorch 1.13.1. Though it is flaky, it shows up twice, and might be worthwhile looking into & getting fixed.
Here i…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
### 🚀 The feature, motivation and pitch
I wish to start some dialogue on reducing the boilerplate for the current CUDA graph API -- specifically, for mixing dynamic and static routines in an eager-is…
-
### 🐛 Describe the bug
`_load_for_executorch` pybinding cannot load joint graph for phi-3-mini-lora because `load_into` is not implemented in `MmapLoader`, and `load_into` is used by `Program`'s `l…
-
Getting a segmentation fault when unsing cuda's graph option in the examples. To reproduce, run the `train_mnist_hclt.py` example as it.
```
$ python examples/train_mnist_hclt.py
Compiling 51 Te…