-
This PR seems to cause:
> CUDA RUNTIME API error: DeviceSetLimit failed with error cudaErrorInvalidValue.
( tested on H100 device )
_Originally posted by @hfp in https://githu…
-
### Describe the issue
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : CopyTensorAsync is not implemented
### To reproduce
build from source, cuda 12.3
### Urgenc…
-
### Describe the issue
### Description:
I'm encountering a CUDA error when attempting to execute a training process using ONNXRuntime GPU version 1.19.2 on a system with an NVIDIA H800 GPU (Comp…
-
RT,没有安装CUDA的机器,设置USE_CUDA=OFF, BUILD_ALL_EXAMPLES = OFF,因为仍然还是会编example,编译时报找不到头文件错误,具体:
```
In file included from /root/installs/libdeepvac/examples/src/test_resnet_benchmark.cpp:11:
/usr/libtorch…
-
afol-apiserver-72b-1 | (RayWorkerVllm pid=3779) [E ProcessGroupNCCL.cpp:475] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=16487777, OpType=ALLREDUCE, NumelIn=195911680, Nume…
-
# Motivation
This RFC aims to propose a design for a series of generic memory-related APIs tailored for stream-based accelerators to help users simplify the runtime code written for different devices…
-
### Describe the issue
I found that the Java dependency of onnxruntime-gpu 1.18.0 does not work properly on CUDA 11. Is there a parameter that can allow it to run correctly on CUDA 11? If not, could …
-
Hi
I want to use nvrtc to compile a sm90 kernel in runtime. The problem is that I don't have the kernel instance on host thus can't run to_underlying_arguments to get kernel param to launch the kerne…
-
### Describe the issue
I tried profiling my UNET running on CUDA. And while looking at it in Perfetto, I saw that `SequentialExecutor::Execute` actually took only a fraction of the whole `model_run` …
-
### Describe the issue
Running a model for N iterations in a single ONNX session is way faster than running the same model in 2 independent sessions, each session is run for N/2 iterations each.
¿W…