UppuluriKalyani / ML-Nexus

ML Nexus is an open-source collection of machine learning projects, covering topics like neural networks, computer vision, and NLP. Whether you're a beginner or expert, contribute, collaborate, and grow together in the world of AI. Join us to shape the future of machine learning!
https://discord.gg/n2D4RqnU
MIT License
45 stars 74 forks source link

Feature request: Multimodal VQA using BLIP model architecture #461

Closed Panchadip-128 closed 28 minutes ago

Panchadip-128 commented 2 hours ago

Is your feature request related to a problem? Please describe. A tool that reads an image based on ML algorithms( BLIP model) and implements VQA which answers questions based on user prompts for the image, deployed through Gradio

Describe the solution you'd like This repository will contain an implementation of a Visual Question Answering (VQA) model built using the BLIP (Bootstrapping Language-Image Pre-training) framework. This model can understand image content and answer questions related to the provided images. A tool that reads an image based on ML algorithms( BLIP model) and implements Visual QA which answers questions based on user prompts for the image, deployed through Gradio web application

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Approach to be followed (optional)

Visual Question Answering (VQA) Model

This repository will contain an implementation of a Visual Question Answering (VQA) model built using the BLIP (Bootstrapping Language-Image Pre-training) framework. The model is designed to understand image content and answer questions related to the provided images. The VQA tool utilizes machine learning algorithms to read and interpret images and generate responses to user prompts.

Features

Requirements

To run this project, you will need the following:

pip install -r requirements.txt Installation Clone this repository:

git clone https://github.com/yourusername/vqa-blip.git cd vqa-blip Install the required libraries:

pip install -r requirements.txt Usage Start the Gradio interface:

python app.py Open the web application in your browser at http://localhost:7860.

Upload an image and enter your question in the provided fields.

Click the "Submit" button to get an answer based on the image content.

Example Here's how you can interact with the application:

Upload an image of a cat. Ask, "What animal is in the picture?" The model will respond with "A cat." Model Training The VQA model is based on the BLIP framework, which leverages both image and text data for training. For detailed information on how to train the model, refer to the BLIP documentation.

Contributing

Contributions are welcome! If you have suggestions for improvements or find bugs, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Additional context Add any other context or screenshots about the feature request here.

github-actions[bot] commented 2 hours ago

Thanks for creating the issue in ML-Nexus!πŸŽ‰ Before you start working on your PR, please make sure to:

github-actions[bot] commented 2 hours ago

Thanks for raising this issue! However, we believe a similar issue already exists. Kindly go through all the open issues and ask to be assigned to that issue.

github-actions[bot] commented 28 minutes ago

Hello @Panchadip-128! Your issue #461 has been closed. Thank you for your contribution!