Closed alexedin closed 8 months ago
Hi! What model are you using to run this, and could you potentially post some generations (which you can find if you run the model while passing --log_samples
and --output_path
?)
It may be that your model is just not very good at generation tasks, and so fails to properly terminate the CoT (or just does not use the correct format to give its answer.)
Thanks for your response! I am using Bloomz 7B. Here are two outputs from the logs, where the models is choosing A without getting it right (exact match = 0):
Instance 1
Arguments [['Determine which of two sentences is sarcastic.\n\nAccording to Cambridge University Dictionary, sarcasm is "the use of remarks that clearly mean the opposite of what they say, made in order to hurt someone\'s feelings or to criticize something in a humorous way." Sarcastic sentences often contain satirical or ironic utterances, hyperboles, ambivalent or witty remarks.\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Yes, because having interests and actively researching them is a huge waste\n(B) Yes, because having interests and actively researching them is a huge deal\nA: Let\'s think step by step.\nIf we look at (A), it says that having interests and actively researching them is a huge waste, implying that it is a useless effort. However, we know that having interests and actively researching them is typically not a waste but rather is beneficial to the individual. The presence of such a juxtaposition in (A) suggests that it contains a taste of irony and sarcasm.\nIf we look at (B), it says that having interests and actively researching them is a huge deal, implying that it is an important and consequential effort. This is arguably a neutral and correct statement.\nAbove the above, the sarcastic option is (A). So the answer is (A).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) No one is going to disagree with you on this. Avoiding ad hominem attacks really help your case\n(B) No one is going to disagree with you on this. Ad hominem attacks really help your case\nA: Let\'s think step by step.\nIf we look at (A), it says that avoiding ad hominem attacks really help your case, implying that ad hominem attacks are adverse and injurious. Because ad hominem attacks are adressed at a person rather than an idea, it is indeed true that avoiding them is often useful and helpful; so, (A) is a neutral (valid and agreeable) statement.\nIf we look at (B), it says that ad hominem attacks really help your case, implying that ad hominem attacks are a positive thing. However, we stated previously that ad hominem attacks are often not useful or constructive. The speaker in this sentence therefore seems to mean the opposite of what they are saying; so, there appears to have a taste of irony and sarcasm in (B).\nAbove the above, the sarcastic option is (B). So the answer is (B).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Consistency in the league\'s punishments? What do you think this is supposed to be, politics?\n(B) Consistency in the league\'s punishments? What do you think this is supposed to be, moral?\nA: Let\'s think step by step.\nIf we look at (A), it likens the consistency in the league\'s punishments with that in politics. Because politics or political affairs are often not considered to be consistent or dependable, this sentence appears to be satirical.\nIf we look at (B), it likens the consistency in the league\'s punishments with that in morality. Discussing the consistency of the league\'s punishments in the context of morality, ethics, or law makes sense and does not appear to make a satirical point about anything.\nAbove the above, the sarcastic option is (A). So the answer is (A).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Hey just be happy then you won\'t be depressed anymore\n(B) Hey just be happy that you won\'t be depressed anymore\nA: Let\'s think step by step.', {'max_gen_toks': 1024, 'until': ['', 'Q', '\n\n'], 'do_sample': False, 'temperature': 0.0}]]
Input {'input': "Which statement is sarcastic?\nOptions:\n(A) Hey just be happy then you won't be depressed anymore\n(B) Hey just be happy that you won't be depressed anymore", 'target': '(A)'}
Response [[' If we look at (A), it says that avoiding ad hominem attacks really help your case, implying that ad hominem attacks are adverse and injurious. Because ad hominem attacks are adressed at a person rather than an idea, it is indeed true that avoiding them is often useful and helpful; so, (A) is a neutral (valid and agreeable) statement. Yes, because having interests and actively researching them is a huge waste']]
Exact match 0
Instance 2
Arguments [['Determine which of two sentences is sarcastic.\n\nAccording to Cambridge University Dictionary, sarcasm is "the use of remarks that clearly mean the opposite of what they say, made in order to hurt someone\'s feelings or to criticize something in a humorous way." Sarcastic sentences often contain satirical or ironic utterances, hyperboles, ambivalent or witty remarks.\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Yes, because having interests and actively researching them is a huge waste\n(B) Yes, because having interests and actively researching them is a huge deal\nA: Let\'s think step by step.\nIf we look at (A), it says that having interests and actively researching them is a huge waste, implying that it is a useless effort. However, we know that having interests and actively researching them is typically not a waste but rather is beneficial to the individual. The presence of such a juxtaposition in (A) suggests that it contains a taste of irony and sarcasm.\nIf we look at (B), it says that having interests and actively researching them is a huge deal, implying that it is an important and consequential effort. This is arguably a neutral and correct statement.\nAbove the above, the sarcastic option is (A). So the answer is (A).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) No one is going to disagree with you on this. Avoiding ad hominem attacks really help your case\n(B) No one is going to disagree with you on this. Ad hominem attacks really help your case\nA: Let\'s think step by step.\nIf we look at (A), it says that avoiding ad hominem attacks really help your case, implying that ad hominem attacks are adverse and injurious. Because ad hominem attacks are adressed at a person rather than an idea, it is indeed true that avoiding them is often useful and helpful; so, (A) is a neutral (valid and agreeable) statement.\nIf we look at (B), it says that ad hominem attacks really help your case, implying that ad hominem attacks are a positive thing. However, we stated previously that ad hominem attacks are often not useful or constructive. The speaker in this sentence therefore seems to mean the opposite of what they are saying; so, there appears to have a taste of irony and sarcasm in (B).\nAbove the above, the sarcastic option is (B). So the answer is (B).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Consistency in the league\'s punishments? What do you think this is supposed to be, politics?\n(B) Consistency in the league\'s punishments? What do you think this is supposed to be, moral?\nA: Let\'s think step by step.\nIf we look at (A), it likens the consistency in the league\'s punishments with that in politics. Because politics or political affairs are often not considered to be consistent or dependable, this sentence appears to be satirical.\nIf we look at (B), it likens the consistency in the league\'s punishments with that in morality. Discussing the consistency of the league\'s punishments in the context of morality, ethics, or law makes sense and does not appear to make a satirical point about anything.\nAbove the above, the sarcastic option is (A). So the answer is (A).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Hey just be happy then you won\'t be depressed anymore\n(B) Hey just be happy that you won\'t be depressed anymore\nA: Let\'s think step by step.', {'max_gen_toks': 1024, 'until': ['', 'Q', '\n\n'], 'do_sample': False, 'temperature': 0.0}]]
Input {'input': "Which statement is sarcastic?\nOptions:\n(A) Hey just be happy then you won't be depressed anymore\n(B) Hey just be happy that you won't be depressed anymore", 'target': '(A)'}
Reponse [[' A']]
Exact match 0
Fixed by #1356 !
Hi, I am trying to evaluate my model on the BBH with/without CoT but all task results end up being 0.0. I am quite unexperienced, so please have it in mind when helping me out. Other tasks I've tried do work however (see output below). Thanks!
BBH Script command: python main.py --model hf-auto --model_args pretrained=[my_path] --tasks=bbh_cot_fewshot_snarks --device cuda --batch_size=4 --output_path=[my_path]