Closed PeiqinSun closed 2 months ago
Hi, that's also the score we got for the BigCode leaderboard, it's still in a similar range to what's reported in the paper (29.8 vs 32.3). The difference could be due to using different post-processing or inference settings.
Thanks for your reply.
This is my commands:
This is my results:
So, where is my wrong?