Open TusharAggarwalMSR opened 1 month ago
I've addressed the small issue in the add_template.py script at line 711 and made the necessary updates. Please feel free to try out the newly updated code. Thank you for bringing this to our attention, and don't hesitate to let us know if you notice any new issues during runtime. Your feedback is greatly appreciated!
@mkj3085003 I couldn't see the suggested changes in the newly updated code, still facing these issues.
Can you please take a screenshot to show your detailed error message?
I've verified this evaluation process from scratch, and the add_template script is executing correctly, and the results that appear in the image below indicate that it's running correctly. Maybe pull the image and code from scratch and try again? Or give some more detail on the issue and I can help you out.
As suggested by you, I am able to run the add_template file when I create the image again. I was trying to run evaluation for code_debug, and followed the steps given in the readme file. But the count of unsolved solutions is not decreasing over time. I have tried waiting for few hrs. I am also attaching a SS for your reference.
You can check to see if result=0 is set correctly for this model's solution:
SELECT s.solution_id, s.problem_id, s.result
FROM solution s
JOIN problem p ON s.problem_id = p.problem_id
WHERE s.model_id = "your_model_id_here" AND s.result=0;
Then check to see if the judge process is started:
ps aux | grep judged
If it is not started you need to
nohup bash run_judge.sh > runlog.out 2>&1 &
start the judged process.
After starting it you can wait a few minutes and check the result again.
SELECT s.solution_id, s.problem_id, s.result
FROM solution s
JOIN problem p ON s.problem_id = p.problem_id
WHERE s.model_id = "your_model_id_here" AND s.result!=0;
For example, if there are some RESULTS that are not 0, it means that the problem is being judged.
You can also check the database architecture given in the readme and some SQL commands to design SQL commands to query.
You can try it and see if it performs correctly. And if you only measure code debug, you don't need to recalculate the polish time (that's for code polish and it takes a long time to calculate the limit), the judgement is over and you can calculate the metrics directly.
I would like to confirm whether your issue has been resolved. We are currently considering optimizing the entire evalution process, and your suggestions would be very helpful to us. Looking forward to hearing from you.
I would like to confirm whether your issue has been resolved. We are currently considering optimizing the entire evalution process, and your suggestions would be very helpful to us. Looking forward to hearing from you.
I am facing the same issues as in #6.