Evaluation scripts not working

awasthiabhijeet commented 3 weeks ago

Thank you for open-sourcing this evaluation benchmark.

I am trying to replicate the inference and evaluation steps as suggested in the repository. Inference scripts work well for me.

However, I am facing the following problems with the evaluation.

I had to manually copy the contents of greedy_result folder obtained during inference to evaluation/judge/solution_folder, because I initially got the following error.

Next, I get the following error. It seems vllm_inference.py adds an additional header line in the jsonl file that needs to be removed.
After removing the header lines from the jsonl files, I then get the following error.

Fixing the above error required modifying line 644 in add_template.py (code1=inp["code"][0])
After this change, python3 add_template.py worked for me.
python3 submit_solution.py also works without any warning/error.
Then, I ran the judge nohup bash run_judge.sh > runlog.out 2>&1.
However, runlog.out remains empty even after an hour of running the script.
Here are the outputs of some of the SQL queries I ran

ps -ef | grep judge gives the following output.

Overall, I think evaluation is not currently working out of the box. I would be very helpful if evaluation process runs without errors / additional modifications.

Regards, Abhijeet

mkj3085003 commented 3 weeks ago

Thank you for your attention and for pointing out the issues, we are in the process of optimising the whole evalution process and your suggestions are very useful. Can you try running the following SQL command to see the status of a solution in the database? See if there are any results that are not 0? I can't tell what the problem is because I don't have specific status information? It seems that someone else has encountered this problem before, but I haven't found out why, so I look forward to your reply. Thank you again.

SELECT s.solution_id, s.problem_id, s.result 
FROM solution s 
JOIN problem p ON s.problem_id = p.problem_id 
WHERE s.model_id = ‘your_model_id_here’ AND s.result!=0;

mkj3085003 commented 3 weeks ago

SELECT s.solution_id, s.problem_id, s.result 
FROM solution s 
JOIN problem p ON s.problem_id = p.problem_id 
WHERE s.model_id = 60 AND s.result!=0;

Here model_id should be assigned as an int, not a string, I just want to show that it needs to be replaced here, sorry for the misunderstanding!

awasthiabhijeet commented 3 weeks ago

Thanks @mkj3085003.

I ran the following SQL. My model_id is 60 (int). All the results seem to be 2.

SELECT s.solution_id, s.problem_id, s.result 
FROM solution s 
JOIN problem p ON s.problem_id = p.problem_id 
WHERE s.model_id = 60 AND s.result!=0;

mkj3085003 commented 3 weeks ago

Ok, I will check it out and get back to you as soon as possible, thanks!

mkj3085003 commented 3 weeks ago

Sorry,I found it was due to missing input data,probably due to the data folder failing to upload as a large file using lfs,but I forgot about it,I will sort it out and upload it.

awasthiabhijeet commented 2 weeks ago

Hi @mkj3085003 : do you mean copying hf dataset inside data/ folder?

We already did that before running evaluation scripts.