-
### System Info
ubuntu 20.04
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm …
-
### System Info
CPU architecture: x86_64
CPU/Host memory size: 32G
GPU properties: SM86
GPU name: NVIDIA A10
GPU memory size: 24G
Clock frequencies used: 1695MHz
### Libraries
TensorRT-LL…
-
我修改eval.sh为如下
```
export CUDA_VISIBLE_DEVICES=${1:-1} # 默认用1号线卡
# model_path=${2:-"meta-llama/Meta-Llama-3-8B-Instruct"} # meta-llama/Meta-Llama-3-8B-Instruct, mistralai/Mistral-7B-Instruct-v0…
-
### System Info
- `transformers` version: 4.41.0
- Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.31
- Python version: 3.10.13
- Huggingface_hub version: 0.23.0
- Safetensors version: 0.4…
-
### System Info
CPU x86_64
GPU NVIDIA L40
TensorRT branch: v0.10.0
CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.4
### Who can help?
@kaiyux
…
-
### System Info
trt_llm 0.11.0.dev2024052800
trt 10.0.1
device A800
coda for Tensorrt_llm: latest version in main branch
### Who can help?
@byshiue
### Information
- [X] The official example s…
-
### System Info
I met a trtllm-build issue.
GPU: RTX 3090
I followed official script of the below steps.
1. I ran the below code after installing nvidia container toolkit.
```
docker run -…
-
I have sucessfully converted a Mixtral 8x7B model with tensor parallelism following this script from llama example folder :
python convert_checkpoint.py --model_dir ./Mixtral-8x7B-v0.1 \
…
-
Hi, thanks for your great job for LLM decoding process. I tested the code and got the expected decoding speedup for llama2-7B, but it seems that the end2end time cost does not change too much? (61s ->…
-
Hi 👋 and thanks for the amazing job can’t wait to see the developments in the next few weeks and months.
any plan to work on attention sink ?