More robust JSON prompting and parsing

mrtj commented 5 months ago

Describe the Feature

This feature request is about integrating LangChain's PydanticOutputParser and pydantic models into ragas prompting.

Why is the feature important for you?

The current prompting of the validation metrics uses a somewhat vague definition of the answer format, specifying only examples about the expected format. The LLM then should deduct the expected format and generate an answer that corresponds to the schema that is only implicitly specified in the single metrics implementation. Especially when using different LLMs compared to the default GPT-3.5, this leads to potential parsing errors and the metrics can not be calculated correctly.

For example:

Claude and other models tends to return the binary verdict as a JSON number opposed to the expected string: #752 (this issue is about testset generation, but I saw very similar issues also during context precision/recall calculation) and also #715
Sometimes the "verdict" envelop is omitted from the response: #733
Sometimes the response is embedded in a superfluous envelope: #668
It seems different models have issues with the "Attributed" keys: #619
...and numerous other bugs might be related to the weak JSON parsing.

Additional context

LangChain has a robust implementation of instructing the models to return a JSON response conforming to a specific schema. The output format can be specified with pydantic data classes, the expected JSON schema is injected into the prompt, and the response is automatically parsed by pydantic. Additionally, a retry mechanism can be included for even more robust JSON parsing.

Considering that Ragas is already using LangChain, implementing this feature would not create addition dependencies to the project.

mrtj commented 5 months ago

As implementing this enhancement would have a certain impact on the codebase, I opened this issue to start a discussion about this idea. If you give green light to this feature, I would be happy to contribute to the implementation.

shahules786 commented 5 months ago

Hey @mrtj this is certainly something we have in the roadmap. We welcome any help on this.

shahules786 commented 5 months ago

Lemme know if you're working on it and I'll assign it to you, we can also help you with it.

mrtj commented 5 months ago

Hello, I can confirm that I can start working on this issue. As a side note, I'd also like to ask if you had some particular reason not using LangChain's prompts in this project? There're a couple of interesting features, for example the few-shot example prompt templates that might benefit also ragas. Also, a note on the progress, I've hacked together a custom prompt for the context recall metrics with JSON output parser, and the results are promising.

shahules786 commented 5 months ago

Great @mrtj lemme also check the links you shared. Would love to see the PR :)

fschuh commented 4 months ago

Any love for the dataset generation prompts? Many of those still have poor JSON format specifications, which confuse many OSS LLMs, especially the more chatty ones (such as Llama3).

I see the PR that was merged on Apr 1 (https://github.com/explodinggradients/ragas/pull/807) improves the metrics prompts, but the data generation prompts are still unchanged. It would be nice to get a revamp on those prompts as well.

explodinggradients / ragas

More robust JSON prompting and parsing #761