Open hslee16 opened 5 days ago
Morning @hslee16,
Not sure this is related to this PR, but I figured I'd ask anyways.
For the next big release, we're planning on leveraging the langgraph-api - which comes with a postgres service as a dependency.
The architecture will be:
You mentioned this PR could be helpful for embedding & retrievers - which is the same direction I'm interested in - especially to support the document Uploader feature (the langgraph-api service needs access to documents uploaded via the service on port 8000)
Questions:
Perhaps a design review would be helpful if we'd like to implement Langchain Retrievers which leverage a PGVector as proposed by this user in the discord.
@ElishaKay
Thanks for your speedy response!
how do we save uploaded documents in postgres?
In Langchain using PGVector, we would use the same way as mentioned in the PGVector documentation:
vectorestore = PGVector.from_documents(
docs,
embeddings,
collection_name=collection,
)
This persists the split/parsed documents along with generated embeddings similar to the following:
how do we retrieve?
This part is a little more trickier. The retriever and the base document object in langchain does not appear to have a "fetch all documents" method. Rather, retrievers expose an invoke (or ainvoke) method where users can specify a query that will perform similarity search on the vector store. Only the documents that are "similar" to the query are returned.
My private project uses langchain with PGVector and thus I am able to generate the set of "documents" that gpt-researcher is using when ReportSource
== local
The main goal for me is to leverage the existing documents from PDFs (and other document types) that have already been processed by langchain. Since the local documents used by gpt-research is using various langchain readers, I figured it would be simple enough to bolt on the langchain documents directly.
Please let me know how I can help. If you have existing design documents, I'm happy to take a look. Lastly, the discord link you posted above leads me to a blank channel. Perhaps you can give me access?
My discord user is: hslee16
Many thanks in advance!
Hey @hslee16 thanks for this PR! Can please also include in the PR a tutorial for how to use this here: https://github.com/assafelovic/gpt-researcher/blob/master/docs/docs/gpt-researcher/tailored-research.md
@assafelovic can do!
Loving the concept & initiative @hslee16
To access that discord thread, first accept this invite:
The discord thread proposed an interesting concept of letting the agent decide what type of report_source to use.
Motivation
Rather than using documents within a local directory and use various loaders to generate langchain document instances, let's enable the researcher to use langchain documents directly. This is useful when the existing langchain backend has already created embeddings, documents, and various retrievers.