Open goblin776655 opened 1 year ago
Great idea !!!
Use one of these extensions. If you need to to unload the previous model, you can use supermerger's unload model button. https://github.com/p1atdev/stable-diffusion-webui-blip2-captioner https://github.com/Tps-F/sd-webui-blip2
I was working on this, but I am unable to understand some of the code structure. But I will figure it out, and try to add this feature. But if it is possible, can you give me a brief overview of the overall structure.
@ArjunDevSingla I hope this helps 👇
Here's a brief summary of @p1atdev stable-diffusion-webui-blip2-captioner/blip2.py:
This Python code defines a class BLIP2
which is used to generate captions for images using a pre-trained model. The code uses the PyTorch library and relies on a separate module called lavis.models
.
torch
, typing
, PIL.Image
, and lavis.models
.BLIP2
class with an __init__
method that takes a model_type
argument: a. Determine the device (GPU or CPU) for running the model based on the availability of CUDA. b. Load the pre-trained model and preprocessors using the load_model_and_preprocess
function from lavis.models
.generate_caption
method for the BLIP2
class with several parameters, including the input image and options for controlling the caption generation process: a. Preprocess the input image using the visual preprocessor and move it to the appropriate device (GPU or CPU). b. Generate captions using the pre-trained model and the given parameters for beam search, nucleus sampling, maximum and minimum caption length, and repetition penalty. c. Return the generated captions.unload
method for the BLIP2
class to free up memory by deleting the model, preprocessors, and clearing the GPU cache.The code provides an interface for loading a pre-trained model, generating captions for images, and then unloading the model to free up resources.
Here is a brief summary of @p1atdev stable-diffusion-webui-blip2-captioner/scripts/main.py
This script is a Python program for generating captions for images using BLIP2
. It provides both single-image captioning and batch-image captioning functionalities. The program uses the Gradio library to create a user interface for easy interaction.
os
, pathlib
, torch
, gradio
, and PIL
.ImageFile.LOAD_TRUNCATED_IMAGES
to True
to allow loading of truncated images.script_callbacks
from the modules
package.BLIP2
class from the blip2
module.captioners
to store loaded models.model_list
containing the names of available models ("coco" and "pretrain").sampling_methods
containing the names of available sampling methods ("Nucleus" and "Top-K").model_check
that checks if a model is already loaded or not, and loads the model if it's not in the captioners
dictionary.unload_models
that unloads all the models in the captioners
dictionary and clears GPU cache.generate_caption
that takes an image and various caption generation parameters, and returns a generated caption for the image.generate_caption_for_single_image
that takes an image and caption generation parameters, and returns a caption for the image.create_caption_file
that takes a caption and an output file path, and writes the caption to a file at the specified path.batch_captioning
that takes input and output directories, caption file extension, and caption generation parameters, and generates captions for all the images in the input directory, saving them to the output directory.on_ui_tabs
that creates the Gradio user interface with two tabs: "Single" for single image captioning and "Batch" for batch image captioning. The interface includes various input elements, such as image upload, text boxes, dropdowns, sliders, and buttons.on_ui_tabs
function with the script_callbacks
module using the on_ui_tabs
method.
Is there an existing issue for this?
What would your feature do ?
It will use blip2 models for text desc of images
Proposed workflow
Additional information
No response