In some cases, Llama 1 (and 2, whose eval setups at times differ) paper results are not replicable by our implementations due to Meta’s custom undisclosed prompts or prepended task descriptions.
However, for some tasks like Triviaqa, we have successfully found setups / reverse engineered prompts. Where we have done this we should add documentation and variants of the tasks for ease of use.
In some cases, Llama 1 (and 2, whose eval setups at times differ) paper results are not replicable by our implementations due to Meta’s custom undisclosed prompts or prepended task descriptions.
However, for some tasks like Triviaqa, we have successfully found setups / reverse engineered prompts. Where we have done this we should add documentation and variants of the tasks for ease of use.