explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
6.95k stars 697 forks source link

[R-302] Support function-calling / json mode / structured generation for testset generation #1532

Open ahgraber opened 2 days ago

ahgraber commented 2 days ago

Describe the Feature Most service APIs now support enforcing schema outputs through function calling, json mode, or structured generation. It would be really useful to have an option that would use the service API to enforce schema constraints rather than hoping chat prompts follow the expected format.

Why is the feature important for you? With OpenAI, synthetic generation works flawlessly 99% of the time. With Anthropic or Llama models, I get frequent parse errors, which end up retrying and ultimately failing. This uses a lot of tokens (and therefore $). Concretely, generating a testset of 100 questions, gpt-4o-mini uses ~660k input and produces ~13k output tokens. When I attempt to generate a testset from the same knowledge graph with Anthropic Claude 3.5 sonnet, the generation fails from parse errors but I still end up using ~850k input and ~22.5k output tokens due to the retries!

Additional context Given most of the responses are being parsed with Pydantic, it should be fairly trivial to turn the desired Pydantic object into a jsonschema (hint: openai provides openai.pydantic_function_tool() to convert Pydantic models to openai-compatible subset jsonschema)

R-302

jjmachan commented 1 day ago

@ahgraber thanks for the suggestion - we should definitely do that as the default for the services that do support it

ref: https://python.langchain.com/v0.1/docs/modules/model_io/chat/structured_output/ something on top of this should work

also would love to chat sometime with you too Alex and get more feedback. I've send you an email to connect. Are you on discord btw

cheers ❤️ Jithin