Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
4.89k stars 467 forks source link

wrong citations if initiate the Docs object with texts_index #206

Closed XiaomeiLi1 closed 2 days ago

XiaomeiLi1 commented 10 months ago
Looking at how to convert the langchain Document object to the paper-qa Docs object. Currently, my solution is below.

splitter = splitter_type(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
docs_split = splitter.split_documents(docs) # docs is langchain Document object

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) vectorstore = FAISS.from_documents(docs_split, embeddings, ids=[doc.metadata["source"] for doc in docs_split]) docs_obj = Docs(texts_index= vectorstore, memory=True) response = docs_obj.query(question)

Then I found it cannot pick up (it makes up) the citations. Do you have better solutions for this? BTW, I don't have PDF files downloaded. They are just stored in the langchain Document object. Thanks a lot!

whitead commented 10 months ago

Strange - do you have some example input/output? Hard to tell what could be going wrong.

nandhaece07 commented 8 months ago

Any guidance how to integrate with external vector db such as chromadb? I don't find any documentation for the same.

jamesbraza commented 2 days ago

Any guidance how to integrate with external vector db such as chromadb? I don't find any documentation for the same.

This was asked for in https://github.com/Future-House/paper-qa/issues/153, you can see some information there.

Also, we have just released version 5, which more or less rewrites everything. If you are still stuck, please reopen a new issue using paper-qa>=5