Open kkk-an opened 1 month ago
I have checked my gpt4_discriminative_eval_input and find that the number of examples that need to be evaluated by LLMs are: content: 65 | mixed: 45 | format: 140 | situation: 70 but your paper just reports: content: 50 | mixed: 10 | format: 120 | situation: 55
I am very confused and kindly request your help, thank you so much.
I just run below code and find that the examples need to be evaluated by LLM are not equivalent to your papers. `rule_based_source = ["E2E", "WIKIEVENTS", "CONLL2003", "text_editing", "cnn_dailymail", "xsum", "samsum", "gigaword", "arxiv", "BBH_logical", "BBH_time", "self_made_space", "gsm_8k"]
for type in ["content", "situation", "format", "example", "mixed"]: data = json.load(open(f"./data/{type}_constraints.json")) rule, llm = 0, 0 for d in data: level = d["level"] if level == 0: continue source = d["source"] if source in rule_based_source: rule += 1 else: llm += 1 print(f"type: {type}, rule: {rule}, llm:{llm}")`
Is there any misunderstanding in your paper of code?
Thanks for your reply.