-
### What is your question?
**What is your question?**
I'm developing a CUDA project using conan, CMake and MSVC build tools. After update from CUDA 11.x to CUDA 12.6, any project using conan CMake…
-
Here's a way to trigger a CUDA error of not permitted operation during graph capturing. It would be great to detect that the CUDA graph capture is ongoing before we try to load/compile kernels and err…
-
### 🐛 Describe the bug
Dynamo creates a graph break around `set_autocast_enabled` causing fullgraph=True mode to fail. Since `torch.autocast` context manager is supported in Dynamo its lower-level co…
-
A handy tool provided in Torch that I think would make a great addition to the bindings is the [CUDAGraph](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/) support.
You can see the…
-
Hi,
I found that the unpad_input function makes the cuda graph capture fail if we have key_attention_mask.
https://github.com/HazyResearch/flash-attention/blob/72ad03eaa661f6bf3a14c855316c27fbab4f…
-
> if graph capture is thread local
Graph capture is [initiated on a Cuda stream](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html#group__CUDART__STREAM_1g793d7…
-
It seems that `Kokkos` tests aren't being shuffled when they run.
I think they should.
Attached is an example that illustrates the need for shuffling. If the test `first` runs first, the test `s…
-
**🚀 Feature**
CUDA-Graph is introduced in CUDA-10.1 to reduce kernel launch overhead. CUDA-Graph matches current NNFusion's design, so it could be easily integrated to cuda_codegen to improve perfo…
-
## 🐛 Bug
![胡言乱语](https://github.com/user-attachments/assets/4f446294-a903-412d-ad98-987d0f04a60a)
## To Reproduce
Steps to reproduce the behavior:
1. 编译
mlc_llm compile /path/to/internl…
-
I'm trying to run inference on a test model with TensorRT EP in C# with CUDA graph option enabled.
I couldn't find a canonical example in C#, so I tried to port the [official C++ example](https://onnx…