District-Administration-Varanasi / document-chatbot

1 stars 13 forks source link

Government Document Chatbot using langchain #3

Open AleenDhar opened 3 months ago

AleenDhar commented 3 months ago

Hi @amit-s19, I am a 2nd year BTech student and here is my solution to the problem :-

I am using Langchain and Gemini-pro to achieve the following results:

Result

chrome_oAcmNTpTRn

The steps are as follows:

  1. Using PyPDF2 to read the pdf document
  2. Using langchain.text_splitter to split the text written in the pdf
  3. Using GoogleGenerativeAIEmbeddings for creating text embeddings
  4. I have used FAISS as a vectorstore but we can easily replace that with ChromaDB or Pinecone.
  5. I have used a basic prompt template and question_answering chain to talk to the PDF data
  6. Using Streamlit to create a simple user interface.

The reason why I am using gemini-pro is because it has multi-language support. later we can replace it with our own model fine-tuned in hindi or any other language.

we can use gemini-visison-pro to read the image data inside the pdf can convert it into text, which can later be used for question answering