CodeEditorBench / CodeEditorBench

Apache License 2.0
28 stars 1 forks source link

Evaluation scripts not working #6

Open awasthiabhijeet opened 3 weeks ago

awasthiabhijeet commented 3 weeks ago

Thank you for open-sourcing this evaluation benchmark.

I am trying to replicate the inference and evaluation steps as suggested in the repository. Inference scripts work well for me.

However, I am facing the following problems with the evaluation.

image

image

image

image

Overall, I think evaluation is not currently working out of the box. I would be very helpful if evaluation process runs without errors / additional modifications.

Regards, Abhijeet

mkj3085003 commented 3 weeks ago

Thank you for your attention and for pointing out the issues, we are in the process of optimising the whole evalution process and your suggestions are very useful. Can you try running the following SQL command to see the status of a solution in the database? See if there are any results that are not 0? I can't tell what the problem is because I don't have specific status information? It seems that someone else has encountered this problem before, but I haven't found out why, so I look forward to your reply. Thank you again.

SELECT s.solution_id, s.problem_id, s.result 
FROM solution s 
JOIN problem p ON s.problem_id = p.problem_id 
WHERE s.model_id = ‘your_model_id_here’ AND s.result!=0;
mkj3085003 commented 3 weeks ago
SELECT s.solution_id, s.problem_id, s.result 
FROM solution s 
JOIN problem p ON s.problem_id = p.problem_id 
WHERE s.model_id = 60 AND s.result!=0;

Here model_id should be assigned as an int, not a string, I just want to show that it needs to be replaced here, sorry for the misunderstanding!

awasthiabhijeet commented 3 weeks ago

Thanks @mkj3085003.

I ran the following SQL. My model_id is 60 (int). All the results seem to be 2.

SELECT s.solution_id, s.problem_id, s.result 
FROM solution s 
JOIN problem p ON s.problem_id = p.problem_id 
WHERE s.model_id = 60 AND s.result!=0;

image

mkj3085003 commented 3 weeks ago

Ok, I will check it out and get back to you as soon as possible, thanks!

mkj3085003 commented 3 weeks ago

Sorry,I found it was due to missing input data,probably due to the data folder failing to upload as a large file using lfs,but I forgot about it,I will sort it out and upload it.

awasthiabhijeet commented 2 weeks ago

Hi @mkj3085003 : do you mean copying hf dataset inside data/ folder?

We already did that before running evaluation scripts.

mkj3085003 commented 2 weeks ago

No, it's the processed inputs and outputs, not quite the same as the hugging face dataset, it's the result of processing it to OJ, I'll upload it now, sorry for the delay!

mkj3085003 commented 2 weeks ago

I've uploaded data.tar.gz using git lfs, you can download and unzip it to judge/data, then you need to create a new log folder under judge (judge/log), I've updated run_judge.sh to add this mkdir command, you can also pull the new run_judge. sh. When all the judges are finished, you can run bash stop.sh (it will close the judging and clean up the run folder and client.pid etc. that are temporarily generated by the judge).

mkj3085003 commented 2 weeks ago

After the judging process, you can skip step 5 and calculate the metrics first, as the process of computing the Polish metric may take a long time, possibly up to a week. However, subsequent runs won't require recalculating these limits. You can calculate the metrics first, and the scores for "code debug", "code translate", and "code switch" can be directly computed. Please try to see if you can run judge correctly now and I look forward to your reply.

sjtubblythe commented 2 weeks ago

Hi @mkj3085003 , I'm getting Error downloading object: evaluation/judge/data.tar.gz (8ad3efd): Smudge error: Error downloading evaluation/judge/data.tar.gz (8ad3efd4e2a7f1a968a39be20d2c5d90a9b1fe528c4fb130a981b2ec8e3f5235): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.. Is there any other way to access the data? Thanks!

mkj3085003 commented 2 weeks ago

Okay, I will upload the data to Google Drive and share the link, please wait a moment!

mkj3085003 commented 2 weeks ago

https://drive.google.com/file/d/1sC2cksEOmWMGK9k0zF_Rluh62ivEF7Po/view?usp=sharing