bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
831 stars 219 forks source link

Humanevalpack-c++ issue #153

Closed Gene-su-GGN closed 5 months ago

Gene-su-GGN commented 1 year ago

Hi,

I always generate pass@1 = 0 when running Humanevalpack-cpp Evaluating generations... { "humanevalsynthesize-cpp": { "pass@1": 0.0 },

However, take #2 instances as an example, the reference & generations json file is shown as below. If I run it in the C++ compiler, the test case is passed. I'm not sure why I'm facing this issue only in C++, is there any syntax I should add when running this case?

references.json

#undef NDEBUG
#include<assert.h>
int main(){
 assert (truncate_number(3.5) == 0.5); 
 assert (abs(truncate_number(1.33) - 0.33) < 1e-4);
  assert (abs(truncate_number(123.456) - 0.456) < 1e-4);
}

==================================================

generations.json

/*
Given a positive floating point number, it can be decomposed into
and integer part (largest integer smaller than given number) and decimals
(leftover part always smaller than 1).

Return the decimal part of the number.
>>> truncate_number(3.5)
0.5
*/
#include<stdio.h>
#include<math.h>
using namespace std;
float truncate_number(float number){
    return number - floor(number);
}
SivilTaram commented 1 year ago

@Gene-su-GGN Maybe the problem would be execution using c++ environments. Could you separate the phrases of code generation and code execution, and then see what happens when executing the code? We would recommend to use colabe to evaluate the model since the environment is reproducible.