Closed RichardScottOZ closed 11 months ago
Hi, Richard, againπ, thanks for your advice. We uploaded some sample geoscience questions to the generation folder path.
Ah, very good! Have you tried anything like few shot inference here? People would probably like to see that too.
Good issue, and we are not showing this kind of cases. From my perspective, we use few shot to inference mostly on specific tasks like multiple-choice, information extraction or stimulate the ICL for specific format generation (which 7B model may not good at it). We are undergoing training and evaluating a larger language model for geo, and we indeed design some cases to make our model be equipped with more abilities.
Yeah, have the 'here are a few examples of texts' - is this text one of those type sort of check.
When you are using your benchmark, maybe example :- paper suggests something like this?
{'id': 'apstudy_question_hg', 'question': {'stem': "The umbrella theory explaining the Earth's movement, contact, and flattening of large land plates is known as", 'choices': [{'text': 'the Coriolis effect', 'label': 'A'}, {'text': 'plate tectonics', 'label': 'B'}, {'text': 'hotspots', 'label': 'C'}, {'text': 'the Richter Magnitude Scale', 'label': 'D'}, {'text': 'the subduction zone', 'label': 'E'}]}, 'The answer is?': ''}
Formatting this wrong will probably mean results will be somewhat different.?
As we mentioned in the paper (Sec. 5.1 Objective tasks in GeoBenchmark), we prompt ending with the phrase "The answer is" and calculate the ππππ‘πππ₯ of the probability of next token among the alphabet "A,B,C,D,E".
I have run a test, with the generic generate function
[
{
"id": "apstudy_question_hg",
"question": {
"stem": "Because he figured out that sedimentary rock must have been compacted and compressed, over many ages, _______ is known as the father of modern geology.",
"choices": [
{
"text": "Richard Palmer",
"label": "A"
},
{
"text": "James Hutton",
"label": "B"
},
{
"text": "W",
"label": "C"
},
{
"text": "Nicholas Steno",
"label": "D"
},
{
"text": "Aubrey Hough",
"label": "E"
}
]
},
"The Answer is?": ""
}
]
[{"question": {"id": "apstudy_question_hg", "question": {"stem": "Because he figured out that sedimentary rock must have been compacted and compressed, over many ages, _______ is known as the father of modern geology.", "choices": [{"text": "Richard Palmer", "label": "A"}, {"text": "James Hutton", "label": "B"}, {"text": "W", "label": "C"}, {"text": "Nicholas Steno", "label": "D"}, {"text": "Aubrey Hough", "label": "E"}]}, "The Answer is?": ""}, "answer": "The correct answer is: B"}]
A different test?
e.g. if want to adapt it to something not-Llama like Falcon or something else to try there
As mentioned above, here I give you a toy example:
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("/path/to/your/model", use_fast=True)
model = AutoModelForCausalLM.from_pretrained("/path/to/your/model", device_map="auto")
input = "Please select the correct option: The following substances are mainly formed by groundwater metasomatism: () \n A. nodule\n B. Quanhua\n C. silicified wood\n D. halite pseudocrystal\n\nThe answer is: ### Output\n"
input_ids = tokenizer(input, return_tensors='pt')
outputs = model(input_ids["input_ids"])
the outputs["logits"]
is the probs
of each token in vocab, and we can use tokenizer("A")['input_ids'][-1]
to get the ids
of the candidates alphabets and further do softmax
among the probs
of the candidates alphabets.
p.s. Since these questions are no longer related to the input_ls itself, I suggest to close this issue.
Right, can add that to another one.
Your generate examples includes this - but there is no example file it seems?
Maybe include one with a few sample geoscience questions excluded to see if people get similar to you
'What is the most common igneous rock?' or things like that perhaps?