Panchadip-128 commented 2 hours ago

Is your feature request related to a problem? Please describe. A tool that reads an image based on ML algorithms( BLIP model) and implements VQA which answers questions based on user prompts for the image, deployed through Gradio

Describe the solution you'd like This repository will contain an implementation of a Visual Question Answering (VQA) model built using the BLIP (Bootstrapping Language-Image Pre-training) framework. This model can understand image content and answer questions related to the provided images. A tool that reads an image based on ML algorithms( BLIP model) and implements Visual QA which answers questions based on user prompts for the image, deployed through Gradio web application

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Approach to be followed (optional)

Visual Question Answering (VQA) Model

This repository will contain an implementation of a Visual Question Answering (VQA) model built using the BLIP (Bootstrapping Language-Image Pre-training) framework. The model is designed to understand image content and answer questions related to the provided images. The VQA tool utilizes machine learning algorithms to read and interpret images and generate responses to user prompts.

Features

Image Understanding: The model analyzes and comprehends the content of images.
Question Answering: Users can input questions related to the image, and the model will generate relevant answers.
User-Friendly Interface: Deployed through a Gradio web application, providing an interactive user experience.

Requirements

To run this project, you will need the following:

Python 3.x
Required libraries (can be installed via pip)

pip install -r requirements.txt Installation Clone this repository:

git clone https://github.com/yourusername/vqa-blip.git cd vqa-blip Install the required libraries:

pip install -r requirements.txt Usage Start the Gradio interface:

python app.py Open the web application in your browser at http://localhost:7860.

Upload an image and enter your question in the provided fields.

Click the "Submit" button to get an answer based on the image content.

Example Here's how you can interact with the application:

Upload an image of a cat. Ask, "What animal is in the picture?" The model will respond with "A cat." Model Training The VQA model is based on the BLIP framework, which leverages both image and text data for training. For detailed information on how to train the model, refer to the BLIP documentation.

Contributing

Contributions are welcome! If you have suggestions for improvements or find bugs, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Additional context Add any other context or screenshots about the feature request here.

github-actions[bot] commented 2 hours ago

Thanks for creating the issue in ML-Nexus!🎉 Before you start working on your PR, please make sure to:

⭐ Star the repository if you haven't already.
Pull the latest changes to avoid any merge conflicts.
Attach before & after screenshots in your PR for clarity.
- Include the issue number in your PR description for better tracking. Don't forget to follow @UppuluriKalyani – Project Admin – for more updates! Tag @Neilblaze,@SaiNivedh26 for assigning the issue to you. Happy open-source contributing!☺️

github-actions[bot] commented 2 hours ago

Thanks for raising this issue! However, we believe a similar issue already exists. Kindly go through all the open issues and ask to be assigned to that issue.

github-actions[bot] commented 28 minutes ago

Hello @Panchadip-128! Your issue #461 has been closed. Thank you for your contribution!

UppuluriKalyani / ML-Nexus

Feature request: Multimodal VQA using BLIP model architecture #461

Visual Question Answering (VQA) Model

Features

Requirements

Contributing

License