Open ChangranXU opened 4 weeks ago
Our evaluation framework has just been updated, so you can adjust the benchmark fit a little bit according to the new framework.
Our evaluation framework has just been updated, so you can adjust the benchmark fit a little bit according to the new framework.
Sure, I have made the modification.
Move all yaml files to path
align_anything/configs/evaluation/benchmarks
Thanks for your advice.
@ChangranXU It shows here that there are conflicting files in your current PR that have not been resolved.
@ChangranXU It shows here that there are conflicting files in your current PR that have not been resolved.
It is the main file. As required in documentation, it should be modified after adding the additional benchmarks.
@ChangranXU It shows here that there are conflicting files in your current PR that have not been resolved.
It conflicts because I modify on the main fork last night and do not update it today. I have made local correction, and submit the commit request.
Hi, @ChangranXU , If you want us to review the content of PR again, please comment
in time and @Reindulger again.
Hi, @ChangranXU , If you want us to review the content of PR again, please
comment
in time and @Reindulger again.
@Reindulger all newly added benchmarks have been validated.
@Reindulger @zmsn-2077 All benchmarks have been validated on all tasks. Only problem is <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
C-Eval | https://huggingface.co/datasets/zacharyxxxxcr/ceval-exam | ['accountant', 'advanced_mathematics', 'art_studies', 'basic_medicine', 'business_administration', 'chinese_language_and_literature', 'civil_servant', 'clinical_medicine', 'college_chemistry', 'college_economics', 'college_physics', 'college_programming', 'computer_architecture', 'computer_network', 'discrete_mathematics', 'education_science', 'electrical_engineer', 'environmental_impact_assessment_engineer', 'fire_engineer', 'high_school_biology', 'high_school_chemistry', 'high_school_chinese', 'high_school_geography', 'high_school_history', 'high_school_mathematics', 'high_school_physics', 'high_school_politics', 'ideological_and_moral_cultivation', 'law', 'legal_professional', 'logic', 'mao_zedong_thought', 'marxism', 'metrology_engineer', 'middle_school_biology', 'middle_school_chemistry', 'middle_school_geography', 'middle_school_history', 'middle_school_mathematics', 'middle_school_physics', 'middle_school_politics', 'modern_chinese_history', 'operating_system', 'physician', 'plant_protection', 'probability_and_statistics', 'professional_tour_guide', 'sports_science', 'tax_accountant', 'teacher_qualification', 'urban_and_rural_planner', 'veterinary_medicine'] | test split without label -- | -- | -- | --
Description
Add AGIEval, C-Eval, TMMLU, SST-2 benchmarks with vllm.
Motivation and Context
According to the documentation, I add more benchmarks as requested.
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply. If you are unsure about any of these, don't hesitate to ask. We are here to help!