EnragedAntelope / ComfyUI-Doubutsu-Describer

Comfy node for using Doubutsu VLM for image descriptions
10 stars 2 forks source link

Doubutsu Image Describer for ComfyUI

This custom node for ComfyUI allows you to use the Doubutsu small VLM model to describe images. Credit and further information on Doubutsu: https://huggingface.co/qresearch/doubutsu-2b-pt-756

image

Installation

  1. Clone this repository into your ComfyUI's custom_nodes directory: git clone https://github.com/EnragedAntelope/comfyui-doubutsu-describer.git
  2. Install the required dependencies: pip install -r requirements.txt
  3. Download the model files:
    • Create a models directory in the root of this repository (ComfyUI\custom_nodes\ComfyUI-Doubutsu-Describer).
    • Download the model files for "qresearch/doubutsu-2b-pt-756" from Hugging Face and place them in models/qresearch/doubutsu-2b-pt-756/.
    • Download the adapter files for "qresearch/doubutsu-2b-lora-756-docci" and place them in models/qresearch/doubutsu-2b-lora-756-docci/.

You can download these files manually from the Hugging Face website or use the Hugging Face CLI:

Open a command prompt, navigate to your ComfyUI\custom_nodes\ComfyUI-Doubutsu-Describer directory, then execute:

huggingface-cli download qresearch/doubutsu-2b-pt-756 --local-dir models/qresearch/doubutsu-2b-pt-756

huggingface-cli download qresearch/doubutsu-2b-lora-756-docci --local-dir models/qresearch/doubutsu-2b-lora-756-docci

  1. Restart ComfyUI

Usage

After installation, you'll find a new node called "Doubutsu Image Describer" in the "image/text" category. Connect an image to its input, and it will generate a description based on the provided question.

Parameters

License

[Apache 2.0]