Closed jasonz5 closed 3 months ago
Thanks for your comment! The CritiqueLLM published by thu-coai is a different version compared to the critique model we use. So you may have to change the prompt format and the scoring threshold on the line 25 of evaluation/text-eval/get_sum_grade.py
. We also provide the judgement results of the critique model we used in our experiments.
Hi, I met some problems when running the evaluation part as shown bellow:
cd evaluation/text-eval python grading.py --infer_data ../../inference/results/TableLLM-13b/Infer_wtq.jsonl
The response of the CritiqueLLM seems a little wired. The following are five reponses of the CritiqueLLM.In the
get_sum_grade.py
, the fetching pattern ispattern = r'\[\[(.*?)\]\]'
, which is suitable for the above with a ratio of 2/5. Ingrading.py
, I found the critique prompt format as follows:Is there any new format for the prompt of CritiqueLLM?