deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.93k stars 1.93k forks source link

Option to enable structured outputs with OpenAI Generators #8276

Open TuanaCelik opened 3 months ago

TuanaCelik commented 3 months ago

Is your feature request related to a problem? Please describe. OpenAI and many other llm providers are introducing structured outputs. See this doc

class DecomposedQuestions(BaseModel): questions: list[str]

splitter_prompt = """ You are a query engine. You prepare queries that will be sent to a web search component. Sometimes, these queries are very complex. You split up complex queries into multiple queries so that you can run multiple searches to find an answer. When you split a query, you separate the sub-queries with '//'. If the query is simple, then keep it as it is.

Example 1: Query: Did Microsoft or Google make more money last year? Decomposed Questions: DecomposedQuestions(questions=['How much profit did Microsoft make?', 'How much profit did Google make?'])

Example 2: Query: What is the capital of Germany? Decomposed Questions: DecomposedQuestions(questions=['What is the capital of Germany?'])

Example 3: Query: {{question}} Decomposed Questions: """

builder = PromptBuilder(splitter_prompt) llm = OpenAIGenerator(model="gpt-4o-mini", generation_kwargs={"response_format": DecomposedQuestions})

pipeline = Pipeline()

pipeline.add_component("prompt", builder) pipeline.add_component("llm", llm)

pipeline.connect("prompt", "llm")


Or something similar.. 

currently this will result in the following error:

/usr/local/lib/python3.10/dist-packages/openai/resources/chat/completions.py in validate_response_format(response_format) 1414 def validate_response_format(response_format: object) -> None: 1415 if inspect.isclass(response_format) and issubclass(response_format, pydantic.BaseModel): -> 1416 raise TypeError( 1417 "You tried to pass a BaseModel class to chat.completions.create(); You must use beta.chat.completions.parse() instead" 1418 )

TypeError: You tried to pass a BaseModel class to chat.completions.create(); You must use beta.chat.completions.parse() instead

dashinja commented 2 months ago

Yes, please implement this : )

thompsondt commented 1 month ago

@TuanaCelik, thank you for proposing this. Did you implement it as a custom component in the near-term?

TuanaCelik commented 1 month ago

Hey @dashinja and @thompsondt

Check out the query decomposition article/recipe. It's not an integration/component officially but I sneaked in an implementation there to help out in the meantime:

https://haystack.deepset.ai/blog/query-decomposition

thompsondt commented 1 month ago

I'm going to try implementing this. The decomposition example maps almost 1:1 to what I'm anticipating in multiple queries.

arubisov commented 1 month ago

Thanks @TuanaCelik ! I found this cookbook post of yours (which actually implements the extended OpenAIGenerator) much more helpful than the blog, which skipped the implementation!

https://haystack.deepset.ai/cookbook/query_decomposition

TuanaCelik commented 1 month ago

Thanks @TuanaCelik ! I found this cookbook post of yours (which actually implements the extended OpenAIGenerator) much more helpful than the blog, which skipped the implementation!

https://haystack.deepset.ai/cookbook/query_decomposition

Yep! It's linked from the article too! You can open the colab from there as well :)