Closed Fangtangtang closed 1 month ago
Can you provide the sample of this happening? We re-ran the program and the results for the first sample in mt bench are shown below:
I did not make change to the rest of the code and use python -m evaluation.eval_opt_classic \ --draft-model-path JackFram/llama-68m \ --base-model-path sharpbai/Llama-2-7b-hf \ --bench-name mt_bench \ --answer-file ./mt_classic_opt.jsonl \ --temperature 0 \ --nodes 60 \ --threshold 0.5 \ --max_depth 10
.
I couldn't use meta-llama/Llama-2-7b-chat-hf
due to authorization issues, so I used sharpbai/Llama-2-7b-hf
as the base model instead, would that be a problem?
I downloaded this version of llama and found that it always outputs 13 for any input when using model.generate(). I'm guessing this might be due to errors in the model parameters, or it not being loaded correctly via the from_pretrained() method. Please try another version of Llama-2-7b-chat-hf.
That’s weird, it works in other experiments. Also, it works fine under certain conditions when using different datasets.😶🌫️
Anyway, thanks a lot for your time.
Hi, I tried to evaluate using
evaluation.eval_opt_classic
and found the output weird. I evaluated withpython -m evaluation.eval_opt_classic \ --draft-model-path JackFram/llama-68m \ --base-model-path sharpbai/Llama-2-7b-hf \ --bench-name mt_bench \ --answer-file ./mt_classic_opt.jsonl \ --temperature 0 \ --nodes 60 \ --threshold 0.5 \ --max_depth 10
and print verifiedbest_candidate
inspforward
. Then I found that the best candidate is always [13].print
added in the blue box.