bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

Humanevalpack-c++ issue #153

Closed Gene-su-GGN closed 1 week ago

Gene-su-GGN commented 8 months ago

Hi,

I always generate pass@1 = 0 when running Humanevalpack-cpp Evaluating generations... { "humanevalsynthesize-cpp": { "pass@1": 0.0 },

However, take #2 instances as an example, the reference & generations json file is shown as below. If I run it in the C++ compiler, the test case is passed. I'm not sure why I'm facing this issue only in C++, is there any syntax I should add when running this case?

references.json

#undef NDEBUG
#include<assert.h>
int main(){
 assert (truncate_number(3.5) == 0.5); 
 assert (abs(truncate_number(1.33) - 0.33) < 1e-4);
  assert (abs(truncate_number(123.456) - 0.456) < 1e-4);
}

==================================================

generations.json

/*
Given a positive floating point number, it can be decomposed into
and integer part (largest integer smaller than given number) and decimals
(leftover part always smaller than 1).

Return the decimal part of the number.
>>> truncate_number(3.5)
0.5
*/
#include<stdio.h>
#include<math.h>
using namespace std;
float truncate_number(float number){
    return number - floor(number);
}
SivilTaram commented 8 months ago

@Gene-su-GGN Maybe the problem would be execution using c++ environments. Could you separate the phrases of code generation and code execution, and then see what happens when executing the code? We would recommend to use colabe to evaluate the model since the environment is reproducible.