dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.05k stars 457 forks source link

JSON is not supported by (Azure) OpenAI API #637

Closed younes-io closed 7 months ago

younes-io commented 8 months ago

Describe the issue as clearly as possible:

I need to generate valid JSON as the output of an email classifier, and I need to be 100% sure that the output will always be valid since it'll be the input of an API call. So, I took the JSON example from the docs and tried it using OpenAI, but it failed.

Steps/code to reproduce the bug:

from pydantic import BaseModel

from outlines import models, generate

class User(BaseModel):
    name: str
    last_name: str
    id: int

model = models.openai("gpt-3.5-turbo")
generator = generate.json(model, User)
result = generator("Create a user profile with the fields name, last_name and id")
print(result)

Expected result:

a JSON representing the User object

# User(name="John", last_name="Doe", id=11)

Error message:

NotImplementedError                       Traceback (most recent call last)
Cell In[13], line 14
     10     id: int
     13 model = models.openai("gpt-3.5-turbo")
---> 14 generator = generate.json(model, User)
     15 result = generator("Create a user profile with the fields name, last_name and id")
     16 print(result)

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\functools.py:909, in singledispatch.<locals>.wrapper(*args, **kw)
    905 if not args:
    906     raise TypeError(f'{funcname} requires at least '
    907                     '1 positional argument')
--> 909 return dispatch(args[0].__class__)(*args, **kw)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\outlines\generate\json.py:76, in json_openai(model, schema_object, sampler)
     72 @json.register(OpenAI)
     73 def json_openai(
     74     model, schema_object: Union[str, object, Callable], sampler: Sampler = multinomial()
     75 ):
---> 76     raise NotImplementedError(
     77         "Cannot use JSON Schema-structure generation with an OpenAI model "
     78         + "due to the limitations of the OpenAI API"
     79     )

NotImplementedError: Cannot use JSON Schema-structure generation with an OpenAI model due to the limitations of the OpenAI API

Outlines/Python version information:

Version information

``` $ python -c "from outlines import _version; print(_version.version)" 0.0.28 $ python -c "import sys; print('Python', sys.version)" Python 3.12.2 (tags/v3.12.2:6abddd9, Feb 6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)] ```

Context for the issue:

I'm building an email classifier for a client, so I need to have a consistent JSON output of the result of classification (a JSON listing the categories and their respective scores); unfortunately, this is not supported due to OpenAI limitations :/

ciliamadani commented 6 months ago

I'm having the same issue, have you figured it out?

surjeet176 commented 6 months ago

i am having the same issue, did anyone figured it out ? this is the code that is am using : `

from outlines import models from outlines import generate modelOutlines = models.openai("gpt-3.5-turbo") schema = """ { "title": "User", "type": "object", "properties": { "name": {"type": "string"}, "last_name": {"type": "string"}, "id": {"type": "integer"} } } """

generator = generate.json(modelOutlines, schema) result = generator( "Create a user profile with the fields name, last_name and id" ) print(result) `

ly0 commented 5 months ago

+1 had the same issue.

nullpointer0xffff commented 5 months ago

+1

evg-kononov commented 4 months ago

+1

ChristianWeyer commented 4 months ago

NotImplementedError: Cannot use JSON Schema-structure generation with an OpenAI model due to the limitations of the OpenAI API

What are you doing these days to make this work with OpenAI @younes-io ?

anishchhaparwal commented 4 months ago

+1

andersbenn commented 4 months ago

+1

lapp0 commented 4 months ago

Unfortunately OpenAI doesn't provide the interfaces necessary to support structured generation.

Outlines works by updating the token probabilities (softmaxed logits) after each decoder pass such that all illegal tokens have a 0% chance of being selected. OpenAI doesn't support direct logits processing.

ddvlamin commented 3 months ago

To better understand how this grammar guiding library works: can't the logit_bias parameter in the openai completions API be used to restrict and guide the output of the model?

https://platform.openai.com/docs/api-reference/completions/create?lang=curl

lapp0 commented 3 months ago

@ddvlamin unfortunately not. logit_bias is constant throughout generation, regardless of what has been generated. You can use this to permanently disable certain tokens while generating, however for structured generation you need to dynamically change which tokens are disabled at each step based on the previously generated tokens.

c0pper commented 3 months ago

So is it safe to say that as of now Outlines library does NOT support OpenAI models?

lapp0 commented 3 months ago

We can do generate.text, but structured generation (generate.regex, generate.json, etc) is not possible with OpenAI's API.

alexcannan commented 2 months ago

The feature is still in beta, but structured outputs should soon be broadly supported by openai. https://platform.openai.com/docs/guides/structured-outputs https://openai.com/index/introducing-structured-outputs-in-the-api/