-
### Describe the issue
When using SetIntraOpNumThreads (1) and SetIntraOpNumThreads (10) on GPU, their inference time is similar, both around 30ms。I have already done warm-up before calculating the…
-
### System Info
- CPU: INTEL RPL
- GPU Name: NVIDIA GTX 4090
- TensorRT-LLM: tensorrt_llm==0.11.0.dev2024060400
- Container Used: Yes and reproduced in Conda as well
- Driver Version: 555.42.02
…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
### Your current environment
```text
The output of `python collect_env.py`
(pytorch_gpu) ➜ vllm git:(main) ✗ python collect_env.py
Collecting environment information...
WARNING 11-03 12:55:08 _c…
-
Click to expand!
### Issue Type
Bug
### Source
binary
### Tensorflow Version
2.6-2.10
### Custom Code
Yes
### OS Platform and Distribution
Linux Fedora 36
### Mobil…
-
I followed your process and converted the engine, as well as trained it myself, but I encountered this problem with both the original and the one I trained myself:
loading annotations into memory...
…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.5.0+cu124
Is debug build: False
CUDA used to build PyTorch…
-
# [RFC] Aten Operators in Triton for Multi-backend support
## Abstract
This RFC discusses
1. the benefits and challenges of developing dispatch functions for Aten operators in Triton.
2. a…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
WARNING 08-22 15:09:07 _custom_ops.py:14] Failed to import from vllm._C with Mo…
-
### Describe the issue
Hi, I use onnxruntime with IOBinding on python, our model has 7 inputs. when I use the OrtValue.ortvalue_from_numpy() to create the OrtValue, I find that the OrtValue.ptr() of …