Closed elmoBG8 closed 5 months ago
In the weblinx paper, we have found that non-finetuned models are not as good as finetuned models, so it is unlikely that Llama-3-8b-Instruct surpasses Llama-3-8B-Web. However, if you think it's worth adding the results, it should be fairly straightforward to run the eval script with minor changes to the config: https://github.com/McGill-NLP/webllama/tree/main/modeling#run-llama-on-evaluation-splits
I see you compared your version with other models, but what about the version from which you derived it? Thanks