MNeMoNiCuZ / joy-caption-batch

A batch captioning tool for joy_caption
MIT License
64 stars 7 forks source link

[Feature request] Use a different huggingface model for llama, not requiring sign-in #8

Closed MNeMoNiCuZ closed 1 week ago

kalle07 commented 1 week ago

eg llava-v1.5 llava-v1.6 or llava-llama-3-8b

may we need the mmproj file ???

let us use it like in kobold https://github.com/LostRuins/koboldcpp

MNeMoNiCuZ commented 1 week ago

eg llava-v1.5 llava-v1.6 or llava-llama-3-8b

may we need the mmproj file ???

let us use it like in kobold https://github.com/LostRuins/koboldcpp

Umm, I don't quite understand what you are saying.

But yes, being able to change the CLIP and LLM is the intent of this feature request.

I don't understand what you linked with kobold.

MNeMoNiCuZ commented 1 week ago

Added support for 2 models. It can easily be configured. One is for Low VRAM use, and one is for high.

kalle07 commented 1 week ago

kobold is usual an LLM gui .. but it has the ability to load any model with the mmproject file to recognize images

so now we can choose models without any huggingface account ?

MNeMoNiCuZ commented 1 week ago

kobold is usual an LLM gui .. but it has the ability to load any model with the mmproject file to recognize images

so now we can choose models without any huggingface account ?

I see, okay. I don't know how to do that setup, but if you want to add it, feel free!

You can change to any huggingface model you wish. This was always the case, but I exposed it a bit higher up now:

# Clip path
CLIP_PATH = "google/siglip-so400m-patch14-384"

# Model paths based on VRAM usage
if LOW_VRAM_MODE:
    MODEL_PATH = "unsloth/llama-3-8b-bnb-4bit"
else:
    MODEL_PATH = "unsloth/Meta-Llama-3.1-8B"

I also added a low VRAM model option that can be used for faster speeds or lower VRAM GPUs.