fpgaminer / joycaption

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
Apache License 2.0
72 stars 0 forks source link

Difference #1

Open webmastermario opened 1 week ago

webmastermario commented 1 week ago

Hello,

i installed it and tried it but there is quit a difference between the joy caption alpha one and alpha two which are more accurate. This one gives false descriptions sometimes and also not acting according the prompt like i entered length: "short" and it will not recognize and write 3 sentences..

Can you fix that?

fpgaminer commented 1 week ago

Hello! I'm sorry that Alpha Two is not working well for you.

This one gives false descriptions sometimes

That can happen with either model, they are unfortunately imperfect. Though one thing I'll point out with Alpha Two specifically: it's very sensitive to the prompt. I recommend sticking to the exact prompts it was trained on, which are listed in the README.md. The HF spaces demo is also useful for seeing what prompts Alpha Two understands.

also not acting according the prompt like i entered length: "short" and it will not recognize and write 3 sentences..

Here's a breakdown of how the model understands lengths:

    if n_words < 20:
        length = 'very short'
    elif n_words < 40:
        length = 'short'
    elif n_words < 60:
        length = 'medium-length'
    elif n_words < 100:
        length = 'long'
    else:
        length = 'very long'

So short can be up to 40 words. It should be very good at following those guidelines. If you notice it isn't, double check to make sure the prompt is correct.

webmastermario commented 1 week ago

thanks for making this clear.

can you add option Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 or what model is it using?

fpgaminer commented 1 week ago

Alpha Two uses a LORA on top of Llama 3.1 8B Instruct. The version downloaded from https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava just has the lora merged in, to make it easier to use.

I'm not sure how well it would work to swap in a different language model, but as long as it's llama 3.1 8B based it might work. You can grab Alpha Two's lora from here: https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava and try applying that on top of Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2.