YJiangcm / FollowBench

Code for "FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)"
https://arxiv.org/abs/2310.20410
Apache License 2.0
85 stars 11 forks source link

about evaluation #10

Open Violettttee opened 2 months ago

Violettttee commented 2 months ago

In addition to constraint ‘example’, is only gpt4 used for the evaluation of other constraint_type? Or are other models evaluated using both rule_based and gpt?

想请问一下这里面除了example以外的其他constraint_type的评估是只用了gpt4吗?还是说其他模型的评估既要用rule_based还要用gpt,双重打分?

Violettttee commented 2 months ago

another question is that why "level0" envolved in the data file?since i see in the utils.py file,it has filterd data which 'level = 0'. 还有一个问题是为什么level0的数据被包含在了数据文件里面呢?我看到在utils.py中的convert_to_api_input函数只添加进了level大于0的数据。

YJiangcm commented 1 month ago

Hi, for your first question, the example constraint is evaluated by rule_based, and other constraints are evaluated by both rule_based and gpt.

For your second question, "level0" is used as additional information during gpt's evaluation. Please refer to Figure 4 in our paper.