Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.44k stars 617 forks source link

JSON Schema #700

Open cclark1e opened 3 days ago

cclark1e commented 3 days ago

Hi all,

Thank you for your excellent work on PaperQA2.

OpenAI provides an option for a JSON schema that the LLM must respond with. (https://openai.com/index/introducing-structured-outputs-in-the-api/)

Is it possible to pass this option through paperQA2 so the final response fits within this schema?

dosubot[bot] commented 3 days ago

Hey @cclark1e, I'm here to help you with any questions or issues you have with PaperQA2. Let's tackle this together!

I couldn't find a direct configuration option in PaperQA2 to specify a JSON schema for the output format. However, there is a reference in the paperqa/docs.py file indicating an intention to specify a JSON schema when many LLM providers support it. The code attempts to parse JSON from a response, which suggests some level of handling for structured outputs [1].

Additionally, there is a test case in the tests/test_paperqa.py file that demonstrates the use of a JSON schema for structuring outputs. This test uses a JSON format to provide a summary, author name, and relevance score [2].

For more detailed implementation or customization, you might need to modify the code to integrate OpenAI's structured output feature directly.

To continue talking to Dosu, mention @dosu.

jamesbraza commented 2 days ago

Hello @cclark1e when you say "final response", what do you mean? Like the string AnswerResponse.session.answer or formatted_answer follows a certain schema?

Can you give an example schema for what you're thinking?