-
It seems I am experiencing CUDA-Graph related issues when using "reduce-overhead" method for compilation. Am I right to assume that the reason for this is because CUDA graphs very much expect the same…
-
### Anything you want to discuss about vllm.
in qwen2vl's mrope imple, vllm decide whether input positions is for multimodal with
![image](https://github.com/user-attachments/assets/6dfc96d9-5162-…
-
**Description**
CUDA Graph not work in tensorrt backend. The model config as below:
```
platform: "tensorrt_plan"
version_policy: { latest: { num_versions: 2}}
parameters { key: "execution_mode"…
-
use_cuda_graph=True or False, These two diffierent setting will result in different inference speed, Why? How the three cuda_graphs matchs the 50 denosing steps?
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue y…
-
Would it be possible to make PhysicsML support CUDA graphs? In some cases it can provide a large speed improvement.
The main requirements to be graph compatible are that tensor shapes have to be s…
-
### Required prerequisites
- [x] Consult the [security policy](https://github.com/NVIDIA/cuda-quantum/security/policy). If reporting a security vulnerability, do not report the bug using this form. U…
-
## 🐛 Bug
Dynamo may create empty graphs where we do redundant work if we use a normal compilation pipeline. Here's an example of an empty function with just autocast region applied on an empty bloc…
-
### Description
```
Error compiling Cython file:
------------------------------------------------------------
...
cpdef intptr_t graphInstantiate(intptr_t graph) except? 0:
# T…
-
> if graph capture is thread local
Graph capture is [initiated on a Cuda stream](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html#group__CUDART__STREAM_1g793d7…