Re-evaluating retrieval

Gautam-Rajeev commented 10 months ago

Highlights of trying to improve retrieval

Improve embeddings : Create a setup that for a given set of chunks and question answer pairs, compares amongst embedding retrieval- reranker combinations. Also include openai text embeddings v3 in the comparison
Fine tune embeddings : While we have chunks and questions, these, we create embeddings dataset simply by considering the first chunk from which question is answered to be the correct chunk to be retrieved (scored as 1) and others are score 0. We need to set this up, such that chunks with more subtle differences are created and scored more naturally to create better setup for fine-tuning embeddings We need to setup code for quick fine-tuning and testing capability of the fine-tuned embeddings

Some earlier work done on simpler retrieval testing here

AbhishekRP2002 commented 9 months ago

hi @GautamR-Samagra , is this still open for the community ? I would like to work on this

Thanks

Gautam-Rajeev commented 9 months ago

@AbhishekRP2002 Not open to community yet, I picked this up myself. Will let you know if there is anywhere I need help on from the community

Samagra-Development / ai-tools