EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.85k stars 1.83k forks source link

OpenAI completions model not using OpenAI Completion API properly to extract LogProbs #1967

Open chimezie opened 4 months ago

chimezie commented 4 months ago

The get_results method in the lm_eval/models/openai_completions.py is not properly using the OpenAI Completion API.

Below is the API definition for LogProbs (from openai):

class Logprobs(BaseModel):
    text_offset: Optional[List[int]] = None

    token_logprobs: Optional[List[float]] = None

    tokens: Optional[List[str]] = None

    top_logprobs: Optional[List[Dict[str, float]]] = None

However, get_results in that module is defined this way:

def get_result(response) -> Tuple[float, bool]:
    is_greedy = True
    logprobs = response.logprobs.token_logprobs
    continuation_logprobs = sum(logprobs)

    for i in range(len(response.logprobs.token_logprobs)):
        token = response.logprobs.token_logprobs[i]
        top_tokens = response.logprobs.top_logprobs[i]
        top_token = max(top_tokens.keys(), key=lambda x: top_tokens[x])
        if top_token != token:
            is_greedy = False
            break

    return continuation_logprobs, is_greedy

It appears the method assumes response.logprobs.token_logprobs is both a list of log probabilities (it sets the continuation_logprobs variable to the sum of its values) and a token since it sets values in the list to a variable named token.

Per the OpenAI type hints, it should be a list of floats and the corresponding token can be determined by extracting it from response.logprobs.tokens (a list of strings) at the same index as each item in response.logprobs.token_logprobs.

LSinev commented 4 months ago

Is this solved at https://github.com/EleutherAI/lm-evaluation-harness/pull/1919 or is it some other issue?

chimezie commented 4 months ago

Is this solved at #1919 or is it some other issue?

It is a different issue.