cuda-graph Search Results

1000+ results
for cuda-graph

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

conan-io/conan #17289

[question] Configuring CUDA projects with Conan CMake toolch…

### What is your question? **What is your question?** I'm developing a CUDA project using conan, CMake and MSVC build tools. After update from CUDA 11.x to CUDA 12.6, any project using conan CMake…

JakubOchnik updated 23 hours ago
3
NVIDIA/Fuser #635

Detect when CUDA Graph capture is ongoing at the time of ker…

Here's a way to trigger a CUDA error of not permitted operation during graph capturing. It would be great to detect that the CUDA graph capture is ongoing before we try to load/compile kernels and err…

IvanYashchuk updated 1 month ago
1
pytorch/pytorch #140118

torch.set_autocast_enabled is not working in torch.compile(f…

### 🐛 Describe the bug Dynamo creates a graph break around `set_autocast_enabled` causing fullgraph=True mode to fail. Since `torch.autocast` context manager is supported in Dynamo its lower-level co…

IvanYashchuk updated 3 weeks ago
1
LaurentMazare/tch-rs #631

[Question] Cuda Graphs support?

A handy tool provided in Torch that I think would make a great addition to the bindings is the [CUDAGraph](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/) support. You can see the…

finnkauski updated 9 months ago
5
Dao-AILab/flash-attention #312

cuda graph capturing fails

Hi, I found that the unpad_input function makes the cuda graph capture fail if we have key_attention_mask. https://github.com/HazyResearch/flash-attention/blob/72ad03eaa661f6bf3a14c855316c27fbab4f…

stephen-youn updated 1 year ago
1
pytorch/pytorch #137844

> if graph capture is thread local

> if graph capture is thread local Graph capture is [initiated on a Cuda stream](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html#group__CUDART__STREAM_1g793d7…

HPC4AI updated 1 month ago
3
kokkos/kokkos #7435

Tests should run with `--gtest_shuffle` to trigger hidden bu…

It seems that `Kokkos` tests aren't being shuffled when they run. I think they should. Attached is an example that illustrates the need for shuffling. If the test `first` runs first, the test `s…

romintomasetti updated 1 month ago
2
microsoft/nnfusion #53

[ENHANCEMENT] CUDA-Graph integration

**🚀 Feature** CUDA-Graph is introduced in CUDA-10.1 to reduce kernel launch overhead. CUDA-Graph matches current NNFusion's design, so it could be easily integrated to cuda_codegen to improve perfo…

xysmlx updated 4 years ago
1
mlc-ai/mlc-llm #3043

[Bug] internlm2_5-20b-q0f16-MLC模型对话胡言乱语

## 🐛 Bug ![胡言乱语](https://github.com/user-attachments/assets/4f446294-a903-412d-ad98-987d0f04a60a) ## To Reproduce Steps to reproduce the behavior: 1. 编译 mlc_llm compile /path/to/internl…

l241025097 updated 1 week ago
2
microsoft/onnxruntime #22583

C# API: Stale results after first run when using TensorRT EP…

I'm trying to run inference on a test model with TensorRT EP in C# with CUDA graph option enabled. I couldn't find a canonical example in C#, so I tried to port the [official C++ example](https://onnx…

vladd updated 5 days ago
4

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for cuda-graph

1000+ results
for cuda-graph