How can I get classification to work with OpenAI models?

ZmeiGorynych commented 7 months ago

When I try running

review = "We had a great stay. Hiking in the mountains was fabulous and the food is really good."
prompt = f"""
argmax
    # use prompt statements to pass information to the model
    "Review: {review}"
    "Q: What is the underlying sentiment of this review and why?"
    # template variables like [ANALYSIS] are used to generate text
    "A:[ANALYSIS]" where not "\\n" in ANALYSIS

    # use constrained variable to produce a classification
    "Based on this, the overall sentiment of the message can be considered to be[CLS]"
distribution
   CLS in [" positive", " neutral", " negative"]
"""

out = lmql.run_sync(prompt,)

I get lmql.runtime.bopenai.openai_api.OpenAIAPILimitationError: The underlying requests to the OpenAI API with model 'gpt-3.5-turbo-instruct' are blocked by OpenAI's API limitations. Please use a different model to leverage this form of querying (e.g. distribution clauses or scoring). When I specify model=lmql.model("openai/gpt-4-0125-preview"), I get OpenAIAPIWarning: OpenAI: ('This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions? (after receiving 0 chunks. Current chunk time: 0.0 Average chunk time: 0.0)', 'Stream duration:', 0.2780332565307617) "<class 'lmql.runtime.bopenai.openai_api.OpenAIStreamError'>"

Is it at all possible to run classification queries against OpenAI, and if so how?

lbeurerkellner commented 7 months ago

OpenAI has discontinued their support for obtaining logits for prompt tokens. This means LMQL's current implementation of distribution scoring does not work with OpenAI models. It is very likely that they changed this to prevent model distillation. All of the remaining Chat and Completion models (text-... models were discontinued in January), block this form of getting logits.

Unfortunately, there is not much that we can do on our end. There is potential in doing distribution scoring via logit_bias, but since OpenAI could just break this in the same way any day, it is uncertain if an implementation makes sense.

lbeurerkellner commented 7 months ago

I recommend using an OpenAI model for reasoning, but actually do the scoring with a local model. Oftentimes, the final scoring does not actually need a very powerful model, like the ones OpenAI provides.

ZmeiGorynych commented 7 months ago

OpenAI has discontinued their support for obtaining logits for prompt tokens.

How do you mean that? The below ran fine for me yesterday, and logprobs and top_logprobs are in their API docs right now.

from openai import OpenAI

statement = "I feel good today"
message = """
Return the a digit describing the sentiment of the following message:
"{statement}"

Return the digit 1 if the sentiment is positive, 0 if the sentiment is neutral, and 2 if the sentiment is negative.
Return just the one character digit, nothing else
""".format(
    statement=statement
)

client = OpenAI()

m = client.models.list()

response = client.chat.completions.create(
    model="gpt-3.5-turbo-0125",
    messages=[
        {"role": "user", "content": message},
    ],
    logprobs=True,
    top_logprobs=5,
)
print(response.choices[0].message.content)
print(response.choices[0].logprobs.content[0].top_logprobs)

lbeurerkellner commented 7 months ago

For distribution scoring, you don't need the logprobs of the generated tokens, but rather of those in the prompt. In previous versions of their API offering, you could set echo: true and logprobs: true, which allowed you to get logprobs for each token in the prompt as well. This has since been disabled.

from openai import OpenAI

statement = "I feel good today"
client = OpenAI()

m = client.models.list()

response = client.completions.create(
    model="gpt-3.5-turbo-instruct",
    prompt=statement,
    logprobs=5,
    echo=True
)
# -> API: Setting 'echo' and 'logprobs' at the same time is not supported for this model.

ZmeiGorynych commented 7 months ago

Thanks for the explanation!

eth-sri / lmql

How can I get classification to work with OpenAI models? #322