Closed amitbcp closed 1 year ago
Did you run the execution in a docker with all installed dependencies?
I'm going to guess that this is a Node version issue. The MultiPL-E JS benchmarks rely on deepEqual which require a fairly recent version of Node. I think the version in Ubuntu 20.04 is too old, but it works in Ubuntu 22.04.
I was trying it on my local setup with all the dependencies. Let me try and use the docker. Can you please confirm that the command/hyper-parameter shared above are correct. @loubnabnl @arjunguha
Correct. You may get something slightly lower than what is reported in the paper. The original MultiPL-E code (github.com/nuprl/MultiPL-E) uses length 512, but interprets it as len(prompt_tokens) + 512
. The evaluation harness I believe includes the prompt in the 512 tokens. So, you may need to increased it.
I was able to reproduce the results. Thanks
I am running the following :
The results is :
Is their any other parameters that I might be missing ?