Closed loubnabnl closed 1 year ago
Tagging @ytzi and @canders1. Let's do this together. I actually have a little evaluation container here in progress, that only supports Racket (never mind why!)
Here my high-level sketch:
--network none
and supports all 18+ languages. (Let's also get some of the others to work that are in the Stack.)I think, Docker/Podman will allow both the directories above to be the same directory if desired.
I think we should do this in a separate repository that omits the analysis scripts, inference running, datasets, and results. BigCode will have their own analyses/inference approach.
Yes it makes sense to do this in a separate repository (maybe use the original MultiPL-E repo?), we can use this issue to discuss and track the progress of the task.
In any case we will need to add the metrics to evaluate and to this repo, and then add the Dockerfile when it's ready to the setup instructions
This is now in progress in here:
https://github.com/nuprl/MultiPL-E/tree/only_code/evaluation
When it is a little more mature, we will upload a container image. For now, you have to build it yourself. The makefile has commands to build the container and to test:
https://github.com/nuprl/MultiPL-E/blob/only_code/evaluation/Makefile#L5
(You can replace podman with docker, either should work.)
closing this issue as MultiPL-E was integrated in https://github.com/bigcode-project/bigcode-evaluation-harness/pull/44 🥳
As part of the integration of MultiPL-E benchmark create Dockerfile/Docker image with all dependencies required to execute the code generations for different programming languages