Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.44k stars 617 forks source link

Failures to correct evaluate non-single letter answers #709

Open jamesbraza opened 2 days ago

jamesbraza commented 2 days ago

For the qa_prompt:

Q: What is my office's zip code?

Options: A) 94107 B) 94106 C) cheesecake D) Insufficient information to answer this question E) -8

And the qa_answer (where correct is A):

the answer is 94106 or 94107

The LLM evaluation says:

The single letter answer is B.

Which gets converted to "T", and declares it as LitQAEvaluation.INCORRECT.

Even though LitQAEvaluation.INCORRECT is actually the right output, we got it right here for the wrong reasons.