Closed ZJLi2013 closed 8 months ago
hi, rag team, many thanks for this demo work.
wonder what's wrong here, why I am getting only meaningless output ?
python3 convert_checkpoint.py --model_dir /workspace/llama2/Llama-2-13b-chat-hf/ --output_dir /workspace/llama2/engine --dtype float16 --use_weight_only --weight_only_precision int4
trtllm-build --checkpoint_dir /workspace/llama2/engine --output_dir /workspace/llama2/engine --gemm_plugin float16 --max_input_len 15360 --max_output_len 1024 --max_batch_size 1
thanks for helping
looks this is due to wo_int4 , rebuild engine with wo-int8, looks all right now
hi, rag team, many thanks for this demo work.
wonder what's wrong here, why I am getting only meaningless output ?
setup as following
trtllm-build --checkpoint_dir /workspace/llama2/engine --output_dir /workspace/llama2/engine --gemm_plugin float16 --max_input_len 15360 --max_output_len 1024 --max_batch_size 1