amitgupta4407 / All_About_PDF

This is a complete website in which you can chat with pdf, extract meta data, text, links, image, and lot more . Check my blog for more details: https://medium.com/@amit.2503719/allaboutpdf-tool-for-data-extraction-and-talking-to-pdf-using-chatpdf-feature-f2daea15a59c
https://amitgupta4407-all-about-pdf-app-dmn92l.streamlit.app/
MIT License
28 stars 11 forks source link
chatpdf gpt langchain pypdf2 python streamlit

AllAboutPDF 📄

AllAboutPDF is a web-based application for working with PDF files. With this app, you can perform a variety of PDF-related tasks, such as finding out mata data, extract image, extract text, extract annotation and more. 🔨 One of the unique features that sets AllAboutPDF apart from other online PDF apps is our ChatPDF feature. This feature allows users to interact with their PDF files using OpenAI and LangChain's natural language processing technology, enabling users to quickly find the information they need and complete tasks more efficiently.

Live Project Link 🚀

The live version of the app is hosted on Streamlit Sharing and can be accessed at the following URL:

Overview 📋

AllAboutPDF is built using the Python programming language 🐍 and the Streamlit framework. The app uses the PyPDF2 library to perform various PDF-related tasks, such as parsing and extracting relavent information from PDFs. The app also uses OpenAI and Langchain APIs to enable the "ChatPDF" feature.

When a user uploads a PDF file to the app, the app performs the requested task (e.g. merging PDFs), and then generates a new PDF file that the user can download.

Installation ⚙️

To install the repository, please clone this repository and install the requirements:

pip install -r requirements.txt

Usage 🏃

streamlit run app.py
streamlit run FileQueryHub.py

Motivation 💡

The motivation behind AllAboutPDF was to create a simple, user-friendly tool for working with PDF files. While there are many PDF-related tools available online, many of them are complex and difficult to use. AllAboutPDF aims to provide an easy-to-use alternative that can be used by anyone, regardless of technical expertise and make process of data extraction a cake work.

Problem Solved ✅

PDF files are a ubiquitous file format used for sharing documents across platforms and devices. However, working with PDF files can often be a tedious and time-consuming process. AllAboutPDF aims to solve this problem by providing a simple, user-friendly tool for working with PDF files.

Tech Stack 🛠️

AllAboutPDF is built using the following technologies:

Challenges Faced 🤔

📚 Selecting the most suitable libraries for the project, which we accomplished by choosing Python, Streamlit, PyPDF2, and LangChain. 🌟 Developing a unique feature that distinguishes AllAboutPDF from other online PDF apps. Our ChatPDF feature allows users to interact with their PDF files using OpenAI and LangChain's natural language processing technology. 💰 Optimizing the cost of preparing the knowledge base for ChatPDF by selecting the correct size and ratio of the chunk size and overlap size.

Future Plans 🔮

We have several future plans for AllAboutPDF, including:

If you have any feedback or suggestions for how we can improve AllAboutPDF, please don't hesitate to get in touch!

Some Screen shot for [ https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf ]


image

image

image

image

image

image

Links

 Ask_Book_Questions_Workflow_Ext