Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.35k stars 601 forks source link

Elaboration on "how is this different from LlamaIndex" FAQ #155

Open rht opened 1 year ago

rht commented 1 year ago

This question is for pedagogical purpose. I tried to reproduce Paper QA's capability from scratch with LlamaIndex, without bells and whistle, as described in the FAQ:

It's not that different! This is similar to the tree response method in LlamaIndex. I just have included some prompts I find useful, readers that give page numbers/line numbers, and am focused on one task - answering technical questions with cited sources.

But the answers created by Paper QA are still better than the LlamaIndex code I wrote based on the previous statement:

Are there more components that are missing? My code:

import os

os.environ["OPENAI_API_KEY"] = "APIKEY"

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ResponseSynthesizer
from llama_index import StorageContext, load_index_from_storage

def make_faiss():
    import faiss
    from llama_index.vector_stores.faiss import FaissVectorStore

    # create faiss index
    d = 1536
    faiss_index = faiss.IndexFlatL2(d)
    # construct vector store
    vector_store = FaissVectorStore(faiss_index)
    return vector_store

vector_store = make_faiss()

documents = SimpleDirectoryReader("docs").load_data()

if os.path.isdir("./storage"):
    # rebuild storage context
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage", vector_store=vector_store
    )
    # load index
    index = load_index_from_storage(storage_context)
else:
    index = VectorStoreIndex.from_documents(documents)
    index.storage_context.persist()

# configure response synthesizer
response_synthesizer = ResponseSynthesizer.from_args(
    # response_mode="compact",
    # response_mode="tree_summarize",
    verbose=True,
)

from llama_index import Prompt

DEFAULT_TEXT_QA_PROMPT_TMPL = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)
PAPERQA_PROMPT = (
    "Write an answer "
    "for the question below based on the provided context. "
    "If the context provides insufficient information, "
    'reply "I cannot answer". '
    "For each part of your answer, indicate which sources most support it "
    "via valid citation markers at the end of sentences, like (Example2012). "
    "Answer in an unbiased, comprehensive, and scholarly tone. "
    "If the question is subjective, provide an opinionated answer in the concluding 1-2 sentences. \n\n"
    "{context_str}\n"
    "Question: {query_str}\n"
    "Answer: "
)
qa_template = Prompt(PAPERQA_PROMPT)
query_engine = index.as_query_engine(
    response_mode="tree_summarize",
    verbose=True,
    text_qa_template=qa_template,
)

questions = [
    "How do I configure LDAP authentication in Zulip?",
    "How do I import Slack workspace export into a Zulip organization?",
    "How do I upgrade my Zulip instance to the latest Git version?",
    "How do I create a self-signed SSL certificate for my Zulip server?",
    "How do I delete a Zulip organization?",
    "Can you describe Zulip architecture, as in, its inner working?",
    "can you describe an overview of Zulip architecture?",
    "How do I get an account's username via the Zulip Python API?",
    "How I set up a bot for Zulip?",
    "What framework does Zulip use for its frontend?",
]

for i, question in enumerate(questions):
    print(f"# {i + 1}. {question}")

    print(query_engine.query(question))
    print()
rfuisz commented 1 year ago

you are skipping a step in the middle of the paperqa implementation. the real paperqa calls openai multiple times.

rht commented 1 year ago

the real paperqa calls openai multiple times.

Can these calls be summarized in few sentences? That it should be sufficient for me to have a "babyagi" implementation of Paper QA.

whitead commented 1 year ago

Really cool to be testing this! Very nice to see the side-by-side results.

I believe the difference may be the tree_summarize step in llamaindex - you need to set the prompt for that step to be what is present in the paper-qa summarize prompt. I'm not sure if/how it can be customized though

rfuisz commented 1 year ago

Thanks Andrew for providing some more thorough context :)

Yeah, it's the summarizing step that I think is helpful there. If you feed in irrelevant sources into the final prompt, you'll find that GPT sometimes does unexpected things. By using an intermediate filtering step, you can summarize the relevant facts from  the citations and weed out any irrelevant facts. By inputting only summaries of the relevant facts in the final prompt, you substantially improve performance.

Now, there's some added cost to this (you're rerunning several API calls here) but for lots of use cases that added performance is worth it.

Richard Bradley Fuisz +1 (202) 340-2435 Fuisz.xyz Usually on the Pacific Time Zone. Usually living in San Francisco.

On Mon, Jul 10, 2023 at 5:43 AM, Andrew White < @.*** > wrote:

Really cool to be testing this! Very nice to see the side-by-side results.

I believe the difference may be the tree_summarize step in llamaindex - you need to set the prompt for that step to be what is present in the paper-qa summarize prompt. I'm not sure if/how it can be customized though

— Reply to this email directly, view it on GitHub ( https://github.com/whitead/paper-qa/issues/155#issuecomment-1628882608 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AHHM4QGF54RBUYJFY2SLIBDXPP2GZANCNFSM6AAAAAA2BNSWKQ ). You are receiving this because you commented. Message ID: <whitead/paper-qa/issues/155/1628882608 @ github. com>

jamesbraza commented 1 month ago

Hi @rfuisz now in v5 (released today) here: https://github.com/Future-House/paper-qa/blob/v5.0.0/paperqa/agents/tools.py

We expose our innards as essentially native Python objects. So you can import these into LlamaIndex, LangChain, or whatever, and use them. So a performance discrepancy should not really exist any more.

I am going to leave this open if you have any other questions, thanks for digging in deeply here

rfuisz commented 1 month ago

big congrats on the release!