Unexpected evaluation output

Jikai0Wang / OPT-Tree

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

15 stars 1 forks source link

Unexpected evaluation output #2

Closed Fangtangtang closed 1 month ago

Fangtangtang commented 1 month ago

Hi, I tried to evaluate using evaluation.eval_opt_classic and found the output weird. I evaluated with python -m evaluation.eval_opt_classic \ --draft-model-path JackFram/llama-68m \ --base-model-path sharpbai/Llama-2-7b-hf \ --bench-name mt_bench \ --answer-file ./mt_classic_opt.jsonl \ --temperature 0 \ --nodes 60 \ --threshold 0.5 \ --max_depth 10 and print verified best_candidate in spforward. Then I found that the best candidate is always [13].

print added in the blue box.

Jikai0Wang commented 1 month ago

Can you provide the sample of this happening? We re-ran the program and the results for the first sample in mt bench are shown below:

Fangtangtang commented 1 month ago

I did not make change to the rest of the code and use python -m evaluation.eval_opt_classic \ --draft-model-path JackFram/llama-68m \ --base-model-path sharpbai/Llama-2-7b-hf \ --bench-name mt_bench \ --answer-file ./mt_classic_opt.jsonl \ --temperature 0 \ --nodes 60 \ --threshold 0.5 \ --max_depth 10. I couldn't use meta-llama/Llama-2-7b-chat-hf due to authorization issues, so I used sharpbai/Llama-2-7b-hf as the base model instead, would that be a problem?

Jikai0Wang commented 1 month ago

I downloaded this version of llama and found that it always outputs 13 for any input when using model.generate(). I'm guessing this might be due to errors in the model parameters, or it not being loaded correctly via the from_pretrained() method. Please try another version of Llama-2-7b-chat-hf.

Fangtangtang commented 1 month ago

That’s weird, it works in other experiments. Also, it works fine under certain conditions when using different datasets.😶‍🌫️ 屏幕截图 2024-08-10 224713

Anyway, thanks a lot for your time.