Open geraldstanje opened 5 days ago
Hi please try the latest version TRT_LLM 0.11+ see the tutorial: https://nvidia.github.io/TensorRT-LLM/installation/linux.html for the latest tensorrt_llm_backend, please refer to https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#build-the-docker-container
hi @hijkzzz i would like to use version v0.8.0 as it works fine for different models e.g. https://huggingface.co/Trendyol/Trendyol-LLM-7b-chat-v1.0
did you look at the logs?
cc @Barry-Delaney @Tracin @byshiue
System Info
GPU Nvidia A10G Cuda version 12.3 Driver version 535.183.01 TensorRT-LLM v0.8.0 Image nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3 (was used to build the tensorrt engine and start triton inference server) Model meta-llm/Meta-Llama-Guard-2-8B OS: Ubuntu
Who can help?
@byshiue @nv-guomingz @hijkzzz
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
debug.txt: debug_trt_llm.txt
cat /tensorrt/triton-repos/trt-Meta-Llama-Guard-2-8B/postprocessing/1/model.py
cat /tensorrt/triton-repos/trt-Meta-Llama-Guard-2-8B/postprocessing/config.pbtxt
Expected behavior
no error
actual behavior
see above
additional notes