garbage output ? - Githubissues

hi, rag team, many thanks for this demo work.

wonder what's wrong here, why I am getting only meaningless output ?

setup as following

download hf checkpoint: llama2-13b-chat-hf

build trt-llm engine as following:


python3 convert_checkpoint.py  --model_dir /workspace/llama2/Llama-2-13b-chat-hf/ --output_dir /workspace/llama2/engine --dtype float16  --use_weight_only --weight_only_precision int4

trtllm-build --checkpoint_dir /workspace/llama2/engine --output_dir /workspace/llama2/engine --gemm_plugin float16 --max_input_len 15360 --max_output_len 1024 --max_batch_size 1



thanks for helping

NVIDIA / ChatRTX

garbage output ? #48

setup as following