Open fuqichen1998 opened 6 months ago
Hi! Are you using the prompt template as in config/dataset2prompt.json?
We refer to our code here for the llama2 prompt: https://github.com/THUDM/LongBench/blob/main/pred.py#L33
Yes, I was using your pred.py
to run the inference and evaluation.
Yes, I was using your
pred.py
to run the inference and evaluation.
Acutally I also get the same result
We refer to our code here for the llama2 prompt: https://github.com/THUDM/LongBench/blob/main/pred.py#L33
The INST is necessary for llama2-7b/llama2-13b?
As the title, my evaluation of
Llama2-7B-chat-4k
onPassageRetrieval-zh
gets10.12
, which is significantly higher than the README (0.5), could you please share why?