localGPT-Vision is an end-to-end vision-based Retrieval-Augmented Generation (RAG) system. It allows users to upload and index documents (PDFs and images), ask questions about the content, and receive responses along with relevant document snippets. The retrieval is performed using the Colqwen or ColPali models, and the retrieved pages are passed to a Vision Language Model (VLM) for generating responses. Currently, the code supports these VLMs:
The project is built on top of the Byaldi library.
localGPT-Vision is built as an end-to-end vision-based RAG system. T he architecture comprises two main components:
Visual Document Retrieval with Colqwen and ColPali:
Response Generation with Vision Language Models:
This architecture eliminates the need for complex text extraction pipelines and provides a more holistic understanding of documents by considering their visual elements. You don't need any chunking strategies or selection of embeddings model or retrieval strategy used in traditional RAG systems.
Follow these steps to set up and run the application on your local machine.
Clone the Repository
git clone https://github.com/PromtEngineer/localGPT-Vision.git
cd localGPT-Vision
Create a Conda Environment
conda create -n localgpt-vision python=3.10
conda activate localgpt-vision
3a. Install Dependencies
pip install -r requirements.txt
3b. Install Transformers from HuggingFace - Dev version
pip uninstall transformers
pip install git+https://github.com/huggingface/transformers
Set Environment Variables Set your API keys for Google Gemini and OpenAI GPT-4:
export GENAI_API_KEY='your_genai_api_key'
export OPENAI_API_KEY='your_openai_api_key'
export GROQ_API_KEY='your_groq_api_key'
On Windows Command Prompt:
set GENAI_API_KEY=your_genai_api_key
set OPENAI_API_KEY=your_openai_api_key
set GROQ_API_KEY='your_groq_api_key'
Run the Application
python app.py
Access the Application Open your web browser and navigate to:
http://localhost:5050/
localGPT-Vision/
├── app.py
├── logger.py
├── models/
│ ├── indexer.py
│ ├── retriever.py
│ ├── responder.py
│ ├── model_loader.py
│ └── converters.py
├── sessions/
├── templates/
│ ├── base.html
│ ├── chat.html
│ ├── settings.html
│ └── index.html
├── static/
│ ├── css/
│ │ └── style.css
│ ├── js/
│ │ └── script.js
│ └── images/
├── uploaded_documents/
├── byaldi_indices/
├── requirements.txt
├── .gitignore
└── README.md
app.py
: Main Flask application.logger.py
: Configures application logging.models/
: Contains modules for indexing, retrieving, and responding.templates/
: HTML templates for rendering views.static/
: Static files like CSS and JavaScript.sessions/
: Stores session data.uploaded_documents/
: Stores uploaded documents..byaldi/
: Stores the indexes created by Byaldi.requirements.txt
: Python dependencies..gitignore
: Files and directories to be ignored by Git.README.md
: Project documentation.graph TD
A[User] -->|Uploads Documents| B(Flask App)
B -->|Saves Files| C[uploaded_documents/]
B -->|Converts and Indexes with ColPali| D[Indexing Module]
D -->|Creates Visual Embeddings| E[byaldi_indices/]
A -->|Asks Question| B
B -->|Embeds Query and Retrieves Pages| F[Retrieval Module]
F -->|Retrieves Relevant Pages| E
F -->|Passes Pages to| G[Vision Language Model]
G -->|Generates Response| B
B -->|Displays Response| A
B -->|Saves Session Data| H[sessions/]
subgraph Backend
B
D
F
G
end
subgraph Storage
C
E
H
end
Contributions are welcome! Please follow these steps:
git checkout -b feature-name
.git commit -am 'Add new feature'
.git push origin feature-name
.