alejandro-ao / langchain-ask-pdf

An AI-app that allows you to upload a PDF and ask questions about it. It uses OpenAI's LLMs to generate a response.
592 stars 309 forks source link

Feature : Autoload a PDF file By URL #1

Open KazeroG opened 1 year ago

KazeroG commented 1 year ago

Autoload the PDF in local file

With this feature, we delete the file input

Explaincation :

The code : app.py

from io import BytesIO
import requests
import streamlit as st
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback

def main():
    st.set_page_config(page_title="Ask your PDF")
    st.header("Ask your PDF 💬")

    # load the PDF file
    url = 'https://www.example.com/example.pdf'
    response = requests.get(url)
    pdf = BytesIO(response.content)

    # extract the text
    pdf_reader = PdfReader(pdf)
    text = ""
    for page in pdf_reader.pages:
        text += page.extract_text()

    # split into chunks
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    chunks = text_splitter.split_text(text)

    # create embeddings
    embeddings = OpenAIEmbeddings()
    knowledge_base = FAISS.from_texts(chunks, embeddings)

    # show user input
    user_question = st.text_input("Ask a question about your PDF:")
    if user_question:
        docs = knowledge_base.similarity_search(user_question)

        llm = OpenAI()
        chain = load_qa_chain(llm, chain_type="stuff")
        with get_openai_callback() as cb:
            response = chain.run(input_documents=docs, question=user_question)
            print(cb)

        st.write(response)

if __name__ == '__main__':
    main()

Run & Test

streamlit run .\app.py