Using external vector DBs

Hey @markokow, I'm here to help you out with any bugs or questions you have while waiting for a human maintainer. Let's squash those bugs together! 🤖

PaperQA provides an interface for plugging in custom vector stores, allowing you to use different vector stores and embedding models. By default, it uses Numpy as a dense vector store, but you can specify a custom embedding model when adding new papers to the Docs object. Here's an example of how you can use a custom embedding model:

from paperqa import Docs, HybridEmbeddingModel, SparseEmbeddingModel, LiteLLMEmbeddingModel

model = HybridEmbeddingModel(
    models=[LiteLLMEmbeddingModel(), SparseEmbeddingModel(ndim=1024)]
)
docs = Docs()
for doc in ("myfile.pdf", "myotherfile.pdf"):
    docs.add(doc, embedding_model=model)

This setup allows you to integrate different vector stores and embedding models as needed. However, specific support or plugins for external vector databases like Pinecone, Weaviate, or FAISS are not explicitly mentioned in the available documentation ^[1].

To continue talking to Dosu, mention @dosu.

Future-House / paper-qa

Using external vector DBs #676