There are differences in the results of Qwen2-7B Instruction

skyCreateXian commented 3 months ago

System Info

GPU：L20 Tensorrt-LLM：v0.11.0 transformers: 4.42.0

Who can help?

@ncomly-nvidia @kaiyux prompt='你好，请介绍一下喜马拉雅山的详细信息'

1、transformers

about params: generation_config = GenerationConfig( top_k=1, temperature=1, max_length=2048, max_new_tokens=80, repetition_penalty=1.0, early_stopping=True, do_sample=True, num_beams=1, top_p=1, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id ) transformers result: ` 喜马拉雅山（Himalayas）是地球上最高的山脉，位于亚洲南部，横跨中国、印度、尼泊尔、不丹、巴基斯坦和阿富汗等国家。以下是关于喜马拉雅山的一些详细信息：

地理位置与范围

喜马拉雅山脉从中国西藏的喜马拉雅山脉开始，向南延伸至印度的喜马拉雅山脉，, 128 `

2、Tensorrt-LLM

about params: batch_input_ids=input_ids, max_new_tokens=80, end_id=tokenizer.eos_token_id, pad_id=tokenizer.pad_token_id, top_k=1 Tensorrt-LLM result: ` 你好！喜马拉雅山（Himalayas）是地球上最壮观的山脉之一，位于亚洲南部，横跨中国、印度、尼泊尔、不丹、巴基斯坦和阿富汗等国家。以下是关于喜马拉雅山的一些详细信息：

地理位置与范围

喜马拉雅山脉从中国西藏的喜马拉雅山脉开始，向南延伸至印度的 `

3、how to create input_ids?

` prompt='你好，请介绍一下喜马拉雅山的详细信息' messages = [{"role": "user", "content": prompt}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) input_ids = tokenizer(prompt, truncation=True, return_tensors="pt", add_special_tokens=False)['input_ids'] `

4、build Qwen2-7B engine

` python convert_checkpoint.py --model_dir /mnt/qwen2/Qwen2-7B-Instruct \ --output_dir checkpoint \ --dtype float16

trtllm-build --checkpoint_dir ./checkpoint \ --output_dir ./fp16 \ --gemm_plugin float16 `

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Test transformers and Tensorrt-LLM results separately using the same input
Comparing the generated token and prompt will reveal differences

Expected behavior

1、I hope qwen2 can be perfectly aligned

actual behavior

1、There are some differences in the results 2、Tested many cases, with approximately 5-10% not fully aligned

additional notes

Nothing

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 15 days with no activity.

NVIDIA / TensorRT-LLM