EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.88k stars 1.83k forks source link

`mmlu_flan_cot_fewshot` is not properly formatted? #1055

Closed pengzhenghao closed 11 months ago

pengzhenghao commented 11 months ago

Task: mmlu_flan_cot_fewshot

Branch: big-refactor

Script:

lm-eval --model hf --model_args pretrained=meta-llama/Llama-2-7b-chat-hf,dtype=float16 --write_out --limit 2 --tasks mmlu_flan_cot_fewshot --num_fewshot 0
The following are multiple choice questions (with answers) about philosophy.

Q: The study of reality in the broadest sense, an inquiry into the elemental nature of the universe and the things in it, is known as _____.
(A) metaphysics (B) epistemology (C) quantum physics (D) axiology
A: Let's think step by step. We refer to Wikipedia articles on philosophy for help. Among the options, only metaphysics studies the nature of reality and existence. The answer is (A).

Q: According to Moore’s “ideal utilitarianism,” the right action is the one that brings about the greatest amount of:
(A) pleasure. (B) happiness. (C) good. (D) virtue.
A: Let's think step by step. We refer to Wikipedia articles on philosophy for help. Moore's "ideal utilitarianism" states that one's actions should maximize intrinsic goods. The answer is (C).

Q: Before Tolstoy's Christian conversion, what was his perspective on the meaning of life?
(A) optimist (B) satisfied (C) nominally religious (D) pessimist
A: Let's think step by step. We refer to Wikipedia articles on philosophy for help. Before his conversion, Tolstoy feels that life was uncertain, which is a pessimist's point of view. The answer is (D).

Q: According to d'Holbach, people always act according to _____.
(A) free choices (B) dictates of the soul (C) necessary natural laws (D) undetermined will
A: Let's think step by step. We refer to Wikipedia articles on philosophy for help. d'Holbach believes that people act according to necessary laws, and it proves nothing about people's free will. The answer is (C).

Q: Psychological egoism is:
(A) an ethical theory about how we ought to behave. (B) a generalization concerning the way people tend to behave. (C) a claim about human nature and the ways people are capable of behaving. (D) none of the above.
A: Let's think step by step. We refer to Wikipedia articles on philosophy for help. Psychological egoism suggests that one behaves based on what makes one feels good, hence it is a claim about human nature and how humans are capable of behaving. The answer is (C).Q: One of the aims of philosophy is to think critically about whether there are good reasons for adopting our beliefs.  Reasons are considered "good reasons" if they are consistent with everyday experience and:
(A) are part of a set of religious, moral, or political beliefs that an individual feels deeply about. (B) are considered good by at least one culture, sub-culture, or individual. (C) cannot be interpreted in different ways by different people or cultures. (D) take into account objections, are acceptable to impartial third parties, and avoid undesirable consequences.
A: Let's think step by step.

You can see that the real question (Q: One of the aims of philosophy is to think critically about whether there are good reasons for adopting our beliefs. Reasons are considered "good reasons" if they are consistent with everyday experience and:) is connected to the few-shot examples.

pengzhenghao commented 11 months ago

Unlike the default yaml file where the description is ended with \n, image

in this task: image

There is no \n at the end of the description.

lintangsutawika commented 11 months ago

Thanks, will make a fix.