-
I am currently getting the same issue in #10.
I have torch 2.5.1, torchrl 0.6.0, tensordict 0.6.0 at the moment. I am running a slightly modified version of the original code. I can run with cudagr…
-
### 🐛 Describe the bug
Enabling both these options causes an error.
Code:
```
import torch
batch_size = 32
seq_length = 50
hidden_size = 768
def test_fn():
inp = torch.randn(batch_size,…
-
> if graph capture is thread local
Graph capture is [initiated on a Cuda stream](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html#group__CUDART__STREAM_1g793d7…
-
I'm trying to run inference on a test model with TensorRT EP in C# with CUDA graph option enabled.
I couldn't find a canonical example in C#, so I tried to port the [official C++ example](https://onnx…
-
### What is your question?
**What is your question?**
I'm developing a CUDA project using conan, CMake and MSVC build tools. After update from CUDA 11.x to CUDA 12.6, any project using conan CMake…
-
hi,I used nsight system to view the timeline after using cuda graph and found that the spmm kernels in the forward and backward passes were clustered together, which seems to break the logic of the pr…
-
im trying to run the project on other machine with cpu then this shows up i change config file
```
parser.add_argument(
"--cuda",
action="store_false",
default=Fales,
…
-
### 🐛 Describe the bug
The following code generates the compile error below:
```
import code
import time
import warnings
import numpy as np
import torch
from torch.nn.attention.flex_attent…
-
### 🐛 Describe the bug
Similar to https://github.com/pytorch/pytorch/issues/140219 these options also fail.
Code:
```
import torch
batch_size = 32
seq_length = 50
hidden_size = 768
def tes…
-
I compile nccl-tests with the command:
```shell
make MPI=1 MPI_HOME=${NVHPC_ROOT}/comm_libs/12.4/hpcx/hpcx-2.19/ompi NCCL_HOME=${NVHPC_ROOT}/comm_libs/nccl CUDA_HOME=${NVHPC_ROOT}/cuda
```
And run th…
heya5 updated
3 weeks ago