NVIDIA / ChatRTX

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
Other
2.76k stars 339 forks source link

garbage output ? #48

Closed ZJLi2013 closed 8 months ago

ZJLi2013 commented 8 months ago

image

hi, rag team, many thanks for this demo work.

wonder what's wrong here, why I am getting only meaningless output ?

setup as following

  1. download hf checkpoint: llama2-13b-chat-hf
  2. build trt-llm engine as following:
    
    python3 convert_checkpoint.py  --model_dir /workspace/llama2/Llama-2-13b-chat-hf/ --output_dir /workspace/llama2/engine --dtype float16  --use_weight_only --weight_only_precision int4

trtllm-build --checkpoint_dir /workspace/llama2/engine --output_dir /workspace/llama2/engine --gemm_plugin float16 --max_input_len 15360 --max_output_len 1024 --max_batch_size 1



thanks for helping
ZJLi2013 commented 8 months ago

looks this is due to wo_int4 , rebuild engine with wo-int8, looks all right now Screenshot from 2024-03-14 11-43-48