-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTor…
-
### Your current environment
```
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (x86_64)
GCC versio…
-
### Describe the issue
When I do multithreaded infer via onnxruntime(python), I get an error. My onnx_session are all independent, model files are all read independently, for multithreaded reasoning …
-
### Misc discussion on performance
I've been running some simple tests on multi-node parallel pipeline with NCCL. I doubled the bandwidth between the nodes but saw no increase in t/s or throughput.…
-
Hi, we've tried to compile the code, which worked, but when we run the code with GPU enabled we get an error.
This is the compilation error log:
```
[ 6%] Building CXX object CMU462/src/CMakeFil…
-
I am trying to accelerate the inference speed of LLama 3 8b on a 4090 using quantization. I noticed this https://github.com/huggingface/optimum-nvidia which should allow to use fp8 and have huge speed…
-
CUDA offers a library named CUPTI - the [CUDA Profiling Tools Interface](https://developer.nvidia.com/CUPTI-CTK10_2):
> CUPTI provides a set of APIs targeted at ISVs creating profilers and other pe…
-
## 🚀 Feature
A different method for sharing CUDA Tensors across processes on Jetson platforms is needed.
CUDA unified addressing based IPC functionality isn't yet supported on Tegra platforms…
-
### Your current environment
```text
The output of `python collect_env.py`
```
```
:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…
-
### Describe the issue
During session run with CUDAExecutionProvider i noticed that the inference time has great variations based on whether or not i use other application on my computer.
For exam…