some questions - Githubissues

YJiangcm / FollowBench

Code for "FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)"

Apache License 2.0

82 stars 11 forks source link

Thanks for your interest in our work!

For your first and second questions, the errors you are encountering are due to parsing failures in the function def paring_discriminative_generation(generation, level) in code/gpt4_based_evaluation.py. This function is to parse the evaluator's response and output the satisfaction rate values. Initially, our experiments used the "gpt-4-0613" version for evaluation. The function may have issues processing different formatting in other versions of GPT-4, leading to these errors. We have since modified the function to enhance its robustness and handle various formats more effectively. Please try using the updated code, which should resolve these parsing errors.

For your third question, the satisfaction rate values can appear as negative because the function def paring_discriminative_generation(generation, level) is designed to return -1 in case of an exception. This means that if there is an error in parsing the evaluator's response, the function defaults to a satisfaction rate of -1 to indicate the issue.

YJiangcm / FollowBench

some questions #3