-
### System Info
- CPU architecture: x86_64
- GPU properties
- GPU name: NVIDIA A100
- GPU memory size: 40G
- Libraries
- TensorRT-LLM branch or tag: v0.9.0
-…
-
As followed README to build trtllm, i met an issue as below, please help me check it. Thank you!
triton/whisper/README.md
Seems like process being killed unexpectedly during converting encoder che…
-
I'm reading the manual here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md
The scripts are so simple, do they ensure best performance?
```
python convert_checkpoint.py …
-
### System Info
- GPU Name: NVIDIA GeForce RTX 3080 Ti
- System Ram: 65GB
### Who can help?
@ncomly-nvidia
@byshiue
### Information
- [ ] The official example scripts
- [X] My own modified scri…
-
Qwen系列的模型,有对应的测试结果吗?
PS:默认的 yaml 配置 跑 qasper 数据集会直接爆显存(NVIDIA A100-SXM4-80GB)
```
model:
type: inf-llm
path: /data/model/open_source_data/Qwen/Qwen1.5-7B-Chat
block_size: 128
n_ini…
-
### System Info
- CPU : X86
- GPU: 8 X L40s
- TRT LLM Version: "0.12.0.dev2024070200"
- NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5
followed https://nvi…
-
### System Info
While trying to debug poor quality of outputs from TRT LLM for Llama3 70b tp=4 (compared to vLLM and HF), I ran into the following message when building bfloat16 engine.
```
[06…
-
I'm trying storywriting with KoboldCpp. At some point the story will get longer than the context and KoboldCpp starts evicting tokens from the beginning, with the (newer) ContextShift feature. Sometim…
-
```bash
python3 QWen1.5_TensorRT-LLM/convert_checkpoint.py --model_dir Qwen1.5-1.8B-Chat --output_dir Qwen1.5-1.8B-Chat-ckpt
trtllm-build --checkpoint_dir ./Qwen1.5-1.8B-Chat-ckpt \
-…
-
### System Info
3090 server
### Who can help?
_No response_
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially supported task in the …