-
### System Info
hi,
i generated the tensorrt llm engine for a llama based model and see that the performance is much worse than vllm.
i did the following:
- compile model with tensorrt llm c…
-
**Is your feature request related to a problem? Please describe.**
NAN
**Describe the solution you'd like**
NAN
**Describe alternatives you've considered**
NAN
**Additional context**
NAN
…
-
Hardware:
Jetson AGX Orin Developer Kit
Software:
JetPack 5.0.1 DP
What works:
Inference works well with FP32
Issue:
Inference does not work with INT8. The following output log can be see…
-
# Background:
in the performance doc [https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/performance.md](url) mentioned:
LLama7B , FP16 , batchsize:256 , input_len:128 output_len:128…
-
Hi,
I've been upsetting tinygrad a bit.
I started it with (as far as I recall)
```
AMD=1 ROCM=1 exo --inference-engine=tinygrad
```
I tried to find a workaround using ZLUDA (using LD_LIBRA…
-
### Your current environment
my device: cuda 11.8
vllm version: vllm-0.5.5
torch is suitable with cuda and vllm
python 3.10
### 🐛 Describe the bug
My env is ready, **Only!!!!!** it wo…
-
Clone source code from below link: https://github.com/mlperf/inference_results_v0.5/tree/master/closed/Intel/code/ssd-small/openvino-windows
List LNK2019: unresolved external symbol too much on th…
-
**Is your feature request related to a problem? Please describe.**
The current examples for DeepSpeed inference uses cmd line 'deepspeed' that internally uses launcher modules of deepspeed to initial…
-
![图片](https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/assets/49723499/b00064c0-70f6-449a-8884-66b7cbdfc842)
1. hello,the picture show inference use GPU+DLA,but i donot find where to use DLA?
…
-
**Describe the bug**
Describe the bug
In DeepSpeed-Chat step3, a runtime error: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 will be thrown when inferenc…