-
Does TF Serving support CUDA graphs?
-
### 🐛 Describe the bug
I have a function that uses randperm and I want to run it in a CUDA graph via torch.compile after applying selective activation checkpoitning:
```
import torch
from xformers…
-
When I try to run `python tic_tac_toe_alpha_zero.py` I get this error:
```
Exception caught in actor-0: Failed call to cuDeviceGet: CUDA_ERROR_NOT_INITIALIZED: initialization error
actor-0 exiting
…
-
**Environments:**
- os: ubuntu server 22.04 LTS
- gpu: H100*2
- docker-ce: 5:27.1.2
- nvidia-container-toolkit: 1.16.1
- image: styler00dollar/vsgan_tensorrt:latest (08/15/2024)
- commit: …
-
您好,非常感谢您开源这么棒的项目,我在使用代码进行多机训练的时候,会经常出现RuntimeError: CUDA error: an illegal memory access was encountered 这一问题,并且出现的十分随机,请问这个报错是因为内存溢出吗?还是因为其他什么原因?
详细的报错如下,已经打开了export CUDA LAUNCH BLOCKING=1
Epoch …
-
### 🐛 Describe the bug
In deepspeed workloads, Zero parameter offload is implemented via module hooks. We find under torch.compile scenrio, if there any graph breaks happen in the pre/post hook of a …
-
## Bug Report
In #12009, graph filling and fillComplete both slowed down significantly, even though the changes should have not had an effect on those. Here is some data from the 4-rank FE assembly p…
-
**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 20.04): Linux
- DeepRec version or commit id: be62ec312595b51b74260f96a6c0872ce5f1540c
- Python version: 3.8
- Bazel versi…
-
Great job!
We found that Quest is implemented on the previous version of flashinfer and some common feature are not support currently.
* bsz > 1
* GQA
* CUDA graph
Is there any plan to update t…
-
**🚀 Feature**
CUDA-Graph is introduced in CUDA-10.1 to reduce kernel launch overhead. CUDA-Graph matches current NNFusion's design, so it could be easily integrated to cuda_codegen to improve perfo…