davendw49 / k2

Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024
Apache License 2.0
158 stars 14 forks source link

input_ls.json #9

Closed RichardScottOZ closed 11 months ago

RichardScottOZ commented 11 months ago

Your generate examples includes this - but there is no example file it seems?

Maybe include one with a few sample geoscience questions excluded to see if people get similar to you

'What is the most common igneous rock?' or things like that perhaps?

davendw49 commented 11 months ago

Hi, Richard, againπŸ˜„, thanks for your advice. We uploaded some sample geoscience questions to the generation folder path.

RichardScottOZ commented 11 months ago

Ah, very good! Have you tried anything like few shot inference here? People would probably like to see that too.

davendw49 commented 11 months ago

Good issue, and we are not showing this kind of cases. From my perspective, we use few shot to inference mostly on specific tasks like multiple-choice, information extraction or stimulate the ICL for specific format generation (which 7B model may not good at it). We are undergoing training and evaluating a larger language model for geo, and we indeed design some cases to make our model be equipped with more abilities.

RichardScottOZ commented 11 months ago

Yeah, have the 'here are a few examples of texts' - is this text one of those type sort of check.

RichardScottOZ commented 11 months ago

When you are using your benchmark, maybe example :- paper suggests something like this?

{'id': 'apstudy_question_hg', 'question': {'stem': "The umbrella theory explaining the Earth's movement, contact, and flattening of large land plates is known as", 'choices': [{'text': 'the Coriolis effect', 'label': 'A'}, {'text': 'plate tectonics', 'label': 'B'}, {'text': 'hotspots', 'label': 'C'}, {'text': 'the Richter Magnitude Scale', 'label': 'D'}, {'text': 'the subduction zone', 'label': 'E'}]}, 'The answer is?': ''}
RichardScottOZ commented 11 months ago

Formatting this wrong will probably mean results will be somewhat different.?

davendw49 commented 11 months ago

As we mentioned in the paper (Sec. 5.1 Objective tasks in GeoBenchmark), we prompt ending with the phrase "The answer is" and calculate the π‘†π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯ of the probability of next token among the alphabet "A,B,C,D,E".

RichardScottOZ commented 11 months ago

I have run a test, with the generic generate function

[
 {
        "id": "apstudy_question_hg",
        "question": {
            "stem": "Because he figured out that sedimentary rock must have been compacted and compressed, over many ages, _______ is known as the father of modern geology.",
            "choices": [
                {
                    "text": "Richard Palmer",
                    "label": "A"
                },
                {
                    "text": "James Hutton",
                    "label": "B"
                },
                {
                    "text": "W",
                    "label": "C"
                },
                {
                    "text": "Nicholas Steno",
                    "label": "D"
                },
                {
                    "text": "Aubrey Hough",
                    "label": "E"
                }
            ]
        },
        "The Answer is?": ""
    }
]
[{"question": {"id": "apstudy_question_hg", "question": {"stem": "Because he figured out that sedimentary rock must have been compacted and compressed, over many ages, _______ is known as the father of modern geology.", "choices": [{"text": "Richard Palmer", "label": "A"}, {"text": "James Hutton", "label": "B"}, {"text": "W", "label": "C"}, {"text": "Nicholas Steno", "label": "D"}, {"text": "Aubrey Hough", "label": "E"}]}, "The Answer is?": ""}, "answer": "The correct answer is: B"}]

A different test?

RichardScottOZ commented 11 months ago

e.g. if want to adapt it to something not-Llama like Falcon or something else to try there

davendw49 commented 11 months ago

As mentioned above, here I give you a toy example:

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("/path/to/your/model", use_fast=True)
model = AutoModelForCausalLM.from_pretrained("/path/to/your/model", device_map="auto")
input = "Please select the correct option: The following substances are mainly formed by groundwater metasomatism: () \n A. nodule\n B. Quanhua\n C. silicified wood\n D. halite pseudocrystal\n\nThe answer is: ### Output\n"
input_ids = tokenizer(input, return_tensors='pt')
outputs = model(input_ids["input_ids"])

the outputs["logits"] is the probs of each token in vocab, and we can use tokenizer("A")['input_ids'][-1] to get the ids of the candidates alphabets and further do softmax among the probs of the candidates alphabets.

p.s. Since these questions are no longer related to the input_ls itself, I suggest to close this issue.

RichardScottOZ commented 11 months ago

Right, can add that to another one.