UW-Madison-DSI / ask-xDD

Retrieval-Augmented Generation (RAG) on 17M full text journal articles.
https://xdd.wisc.edu/
MIT License
2 stars 2 forks source link

Examine hybrid search eval results #72

Closed JasonLo closed 11 months ago

JasonLo commented 11 months ago

Procedure summary:

  1. Get article and paragraph level capitalized terms
  2. Append top 10 terms and top 3 terms to the item's metadata in the vector store
  3. Use Hackathon testset to compare 2 new search strategies (article level and paragraph level term filtering + embedding search)

Findings:

  1. New term filtering strategy is better than the old one in some cases
  2. Paragraph level term filtering is better than article level term filtering in some cases
  3. Have to address missing terms
    • SpaCy proper nouns?
    • Words that contains multiple capitalized letters
    • Allow hyphenated words

hybrid search ta1_eval

JasonLo commented 11 months ago

v2: https://github.com/UW-Madison-DSI/askem/issues/65#issuecomment-1735903466

JasonLo commented 11 months ago

Selected MoreThanOneCapStrategy for deployment at paragraph level.