-
The test `WaitAnyHostAndDeviceSemaphoresAndDeviceSignals` from `cuda_graph_semaphore_submission_test` seems to sometimes fail
with
```
12/412 Test #66: iree/hal/drivers/cuda/cts/cuda_graph_sema…
-
hi,I used nsight system to view the timeline after using cuda graph and found that the spmm kernels in the forward and backward passes were clustered together, which seems to break the logic of the pr…
-
A handy tool provided in Torch that I think would make a great addition to the bindings is the [CUDAGraph](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/) support.
You can see the…
-
### Describe the issue
I am trying to add a custom Triton kernel to ONNX Runtime as an operator. This works, but whenever I call the operator, I get the following CUDA error (Illegal Memory Access):
…
-
### Describe the issue
During cuda graph catpure, ORT will trigger cudaStreamSynchronize, which is not allowed in CUDA graph catpure. Call stack is like the following:
```
libonnxruntime_providers_…
-
Hi,
I've been testing trilinos and came across a broken kk unit tests on h100s w/ cuda 12.4. I have not tried to reproduce the broken test stand alone but figured I'd report it. See configuration 1…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related iss…
-
Model: Qwen-14B-Chat (QWen2)
Dataset: https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese/blob/main/open_qa.jsonl
Environment: 2 A30 GPU
Issue 1:
Error: can't init model correctly. Disab…
-
### System Info
I'm trying to deploy Zephyr 7b on a SageMaker endpoint using TGI. However, I noticed that quantization is not applied, and the logs indicate, 'Bitsandbytes doesn't work with CUDA grap…
-
Hello,
I wonder if mmdetection3d could integrate torch.cuda.graph support for performance optimization?