-
**Description**
CUDA Graph not work in tensorrt backend. The model config as below:
```
platform: "tensorrt_plan"
version_policy: { latest: { num_versions: 2}}
parameters { key: "execution_mode"…
-
### What happened?
The server chases when changing the LoRA scale and using CUDA. To reproduce it:
- Start the server with a model and a LoRA and load layers to CUDA.
- Then, prompt the mode…
-
### Describe the bug
The Graph/RecordReplay/usm_fill.cpp test has been observed to timeout in CUDA CI for unrelated changes. For example, see https://github.com/intel/llvm/pull/14985.
```
TIMEOUT…
-
We should consider whether it is possible and desired to automatically combine kernels into CUDA graphs to reduce overhead of calling individual kernels.
Here is the relevant documentation:
- http…
-
*Note*: If you have a model or program that is not supported yet but should be, please use the program coverage template.
## 🐛 Bug
```py
import thunder
import torch
def func(x):
return…
-
I have an older 1080ti that I use and found that changing from gsplat v0.1.12 to >=v1.0.0 I can no longer use the gsplat library, I don't know if there are plans to support older gpus or if there's an…
-
From latest information on CUDA Graphs, follow the following rules of thumb:
- always use CUDA Graphs to start kernels, it will always be at least the same speed or faster as not using task graphs,…
-
### Describe the bug
On multiple GPU systems, using HIP or CUDA, a process is spawned on all GPUs instead being spawned only on one of them. (See To reproduce section)
This result in memory leak…
-
Platforms: linux, slow
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_cuda_event_created_outside_of_graph_dynamic_shapes&suite=D…
-
TODO list
- [ ] Hide `type_map`
- [ ] Hide toggle grad things
- [ ] Refactoring or from scratch: `scripts/processing_dataset.py` and `train/dataset.py`
- [ ] Implement `universal` keyword for `c…