Closed tangzhy closed 1 year ago
How do you evaluate bigcode models on HELM benchmark?
Do you directly using their crfm-helm tools?
crfm-helm
If so, can you release your bash commands for the community to rigorously reproduce your results?
Sorry we don't have the commands, members from the HELM team did the evaluation, but I believe they used the default as for other code models on reasoning tasks. Maybe you can open the issue on their repo.
How do you evaluate bigcode models on HELM benchmark?
Do you directly using their
crfm-helm
tools?If so, can you release your bash commands for the community to rigorously reproduce your results?