Open tarun-goyal opened 5 months ago
Iteration-1, Run all existing RAGs base, subqa, raptor with different prompts and configuarble parameters(models and its' settings ) to get the desired formatted metadata schema.
This includes a basic evaluation setup where the responses will be checked for the following three metrices 1- is_json, boolean flag if rag response is a json 2- schema_score, a score on number of keys(title, authors, organizations, keywords) present in the rag response. 3- generation_time, total time taken by rag to produce a response
The same evaluation setup should be carried forward to all iterations.
Iteration-2, Design a RAG using langchain metadata tagger, and use it with configuarble parameters to get the desired formatted metadata. https://python.langchain.com/v0.2/docs/integrations/document_transformers/openai_metadata_tagger/
Iteration-3 Design a RAG using llama-index OpenAIPydanticProgram to extract metadata in desired format https://docs.llamaindex.ai/en/stable/module_guides/indexing/metadata_extraction/
- [ ] Answering Open Chat questions - moving to Sprint 4
Moved this to a new ticket
Metadata Extraction Updates-
Integrated 2 Rags meta_llama, meta_lang with UI compatibility for extracting metadata schemas from pdfs.
time_taken in responses is added, pydantic evaluator is done. Exploring for another qualitative evaluator for rag responses.
RAG as llm scorer and visualizer are added in shared iterations.