cuda-graph Search Results

1000+ results
for cuda-graph

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

iree-org/iree #17249

Flaky CUDA runtime test cuda_graph_semaphore_submission_test…

The test `WaitAnyHostAndDeviceSemaphoresAndDeviceSignals` from `cuda_graph_semaphore_submission_test` seems to sometimes fail with ``` 12/412 Test #66: iree/hal/drivers/cuda/cts/cuda_graph_sema…

sogartar updated 4 months ago
1
YukeWang96/TC-GNN_ATC23 #5

Cuda Graph optimization

hi，I used nsight system to view the timeline after using cuda graph and found that the spmm kernels in the forward and backward passes were clustered together, which seems to break the logic of the pr…

plant310 updated 11 months ago
2
LaurentMazare/tch-rs #631

[Question] Cuda Graphs support?

A handy tool provided in Torch that I think would make a great addition to the bindings is the [CUDAGraph](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/) support. You can see the…

finnkauski updated 7 months ago
5
microsoft/onnxruntime #20885

[Build] CUDA Illegal Memory Access error when using a custom…

### Describe the issue I am trying to add a custom Triton kernel to ONNX Runtime as an operator. This works, but whenever I call the operator, I get the following CUDA error (Illegal Memory Access): …

Numeri updated 1 week ago
4
microsoft/onnxruntime #15002

CUDA Graph Error - CUDA failure 900: operation not permitted…

### Describe the issue During cuda graph catpure, ORT will trigger cudaStreamSynchronize, which is not allowed in CUDA graph catpure. Call stack is like the following: ``` libonnxruntime_providers_…

tianleiwu updated 1 month ago
6
kokkos/kokkos-kernels #2316

kokkos kernels: broken unit test w/ cuda 12.4 on h100 gpus w…

Hi, I've been testing trilinos and came across a broken kk unit tests on h100s w/ cuda 12.4. I have not tried to reproduce the broken test stand alone but figured I'd report it. See configuration 1…

vasylivy updated 2 weeks ago
11
sgl-project/sglang #1264

[Bug] Lower single request speed with mla enabled

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. - [X] 3. Please note that if the bug-related iss…

halexan updated 2 days ago
11
vectorch-ai/ScaleLLM #275

[Issue] Qwen-14B-Chat init fail and performance issue.

Model: Qwen-14B-Chat (QWen2) Dataset: https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese/blob/main/open_qa.jsonl Environment: 2 A30 GPU Issue 1: Error: can't init model correctly. Disab…

liutongxuan updated 1 month ago
2
huggingface/text-generation-inference #2467

Quantization Failure with Bitsandbytes on SageMaker TGI Depl…

### System Info I'm trying to deploy Zephyr 7b on a SageMaker endpoint using TGI. However, I noticed that quantization is not applied, and the logs indicate, 'Bitsandbytes doesn't work with CUDA grap…

imadoualid updated 1 week ago
2
open-mmlab/mmdetection3d #1851

CUDA graph support

Hello, I wonder if mmdetection3d could integrate torch.cuda.graph support for performance optimization?

wuziyou199217 updated 1 year ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for cuda-graph

1000+ results
for cuda-graph