Open ErikBjare opened 10 months ago
I would be interested in this as well. I had some attempts on my own for the Python-7B and Instruct-7B models but if I use the same code of Llama-2 the performance is horrible (e.g., 3 and 8% respectively). As a comparison, with the same exact code, Llama-2-chat-7b gives me 11%.
I would be interested in this as well. I had some attempts on my own for the Python-7B and Instruct-7B models but if I use the same code of Llama-2 the performance is horrible (e.g., 3 and 8% respectively). As a comparison, with the same exact code, Llama-2-chat-7b gives me 11%.
I meet the same situation. Even if I try to use instructions in "core/prompts.py", the performance for codellama-7b is 22.8% for pass@1, still lower than the reported number in official document by a large margin. Have you fixed this problem?
I'm keeping https://github.com/ErikBjare/are-copilots-local-yet up-to-date, and would love to see some codellama numbers given it's now SOTA :)