Open lxning opened 2 months ago
Thanks for reporting this. Will take a look at it today.
I can confirm seeing this issue in djl-inference:0.29.0-tensorrtllm0.11.0-cu124
.
Steps to reproduce:
Send a POST request with the stop
parameter:
{
"inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\nYou are rolling a 12-sided dice twice.\n\nQuestion: Can I win more than once?\n<|eot_id|>\n\n<|start_header_id|>assistant<|end_header_id|> Answer:",
"parameters": {
"do_sample": false,
"details": false,
"temperature": 0.7,
"top_p": 0.92,
"max_new_tokens": 220,
"stop": ["<|eot_id|>"]
}
}
Note: the model does not stop on "<|eot_id|>"
so the stop
parameter is needed.
We fixed it the image and released the patched image @lxning try it now.
@pdtgct Could you try with stop_sequences instead of just stop?
Thanks, @sindhuvahinis - will try to find some time to confirm.
Description
(A clear and concise description of what the bug is.)
There are 2 different behavior in LMI trtllm containers during testing gsm8k dataset via lm_eval_harness on model llama-2-7b.
Expected Behavior
(what's the expected behavior?) Expect lm_eval_harness is able to generate report when the djl-inference:0.29.0-tensorrtllm0.11.0-cu124 is applied.
Error Message
(Paste the complete error message, including stack trace.)
Error log in djl-inference:0.29.0-tensorrtllm0.11.0-cu124
How to Reproduce?
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
aws s3 sync s3://djl-llm/llama-2-7b-hf/ llama-2-7b-hf/
docker run -it --gpus all --shm-size 20g -v /home/ubuntu/trtllm/llama-2-7b:/opt/ml/model -p 8080:8080 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124
lm_eval --model local-chat-completions --tasks gsm8k_cot_zeroshot --model_args model=meta-llama/Meta-Llama-2-7B,base_url=http://localhost:8080/v1/chat/completions/model,tokenized_requests=True --limit 10 --apply_chat_template --write_out --log_samples --output_path ~/trtllm/lm_eval/output_llama-2-7b-gsm8k_cot_zeroshot_v11
What have you tried to solve it?
1. 2.