fireicewolf / llava-caption-cli

A Python base cli tool for tagging images with llava models.
Apache License 2.0
1 stars 2 forks source link

llava caption cli

A Python base cli tool for tagging images with llava models.


I make this repo because I want to caption some images cross-platform (On My old MBP, my game win pc or docker base linux cloud-server(like Google colab))

But I don't want to install a huge webui just for this little work. And some cloud-service are unfriendly to gradio base ui.

So this repo born.

Model source

Huggingface are original sources, modelscope are pure forks from Huggingface(Because HuggingFace was blocked in Some place).

Model HuggingFace Link ModelScope Link
llava-v1.6-34B-gguf HuggingFace ModelScope
ggml_llava-v1.5-13b HuggingFace ModelScope
ggml_llava-v1.5-7b HuggingFace ModelScope


make a simple ui by Jupyter widget(When my lazy cancer cured😊)


Python 3.10 works fine.

Open a shell terminal and follow below steps:

# Clone this repo
git clone
cd llava-caption-cli

# create a Python venv
python -m venv .venv

# Install dependencies
# Base dependencies, models for inference will download via python request libs.
pip install -U -r requirements.txt

# If you want to download or cache model via huggingface hub, install this.
pip install -U -r huggingface-requirements.txt

# If you want to download or cache model via modelscope hub, install this.
pip install -U -r modelscope-requirements.txt

Take a notice

This project use llama-cpp-python as base lib, and it needs to be complied.

Simple usage

Make sure your python venv has been activated first!

python your_datasets_path

To run with more options, You can find help by run with this or see at Options

python -h


Advance options `data_path` path for data `--recursive` Will include all support images format in your input datasets path and its sub-path. `config` config json for llava models, default is "default.json" `--use_cpu` Use cpu for inference. `--gpus N` how many gpus used for inference, default is 1. `--split_in_gpus weights` weights to split model in multi-gpus for inference. ex "0.5, 0.5" for 2 gpus balance. `--n_ctx TEXT CONTEXT` Text context, set it larger if your image is large, default is 2048. `--model_name MODEL_NAME` model name for inference, default is "llava-v1.6-34b.Q4_K_M", please check configs/default.json) `--model_site MODEL_SITE` Model site where onnx model download from(huggingface or modelscope), default is huggingface. `--models_save_path MODEL_SAVE_PATH` Path for models to save, default is models(under project folder). `--download_method SDK` Download models via sdk or url, default is sdk. If huggingface hub or modelscope sdk not installed or download failed, will auto retry with url download. `--use_sdk_cache` Use huggingface or modelscope sdk cache to store models, this option need huggingface_hub or modelscope sdk installed. If this enabled, `--models_save_path` will be ignored. `--custom_model_path CUSTOM_MODEL_PATH` `----custom_mmproj_path CUSTOM_MMPROJ_PATH` This two args need to be used together. You can use your exist model. `--custom_caption_save_path CUSTOM_CAPTION_SAVE_PATH` Save caption files to a custom path but not with images(But keep their directory structure) `--log_level LOG_LEVEL` Log level for terminal console and log file, default is `INFO`(`DEBUG`,`INFO`,`WARNING`,`ERROR`,`CRITICAL`) `--save_logs` Save logs to a file, log will be saved at same level with `data_dir_path` `--caption_extension CAPTION_EXTENSION` Caption file extension, default is `.txt` `--not_overwrite` Do not overwrite caption file if it existed. `--system_message SYSTEM_MESSAGE` system message for llava model. `--user_prompt USER_PROMPT` user prompt for caption. `--temperature TEMPERATURE` temperature for llava model,default is 0.4. `--max_tokens MAX_TOKENS` max tokens for output. `--verbose` llama-cpp-python verbose mode.


Base on llama-cpp-python

Without their works(👏👏), this repo won't exist.