Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.82k stars 154 forks source link

[Feat]: Blip 3 support? #328

Open ppbrown opened 5 months ago

ppbrown commented 5 months ago

Describe your use-case.

I just discovered the auto caption feature of OneTrainer. Good idea! .. except it lacks a bit in available variety. Any chance of BLIP 3 support?

CogVLM would be super nice too! But I recognize that is somewhat of a niche customer set, due to requiring 16gig(?) minimum That being said, if you would like to see how to do that one, you can see "easy mode" use at https://github.com/ppbrown/cogvlm-utils

What would you like to see as a solution?

more captioning models please...

Have you considered alternatives? List them here.

No response

C0nsumption commented 5 months ago

I've been experimenting in my free time with these. If I'm able to find the time later in this week I'd be willing to start working on such a pull request. But they can be pretty heavy GPU wise. Phi 3 was pretty good too but very censored. It refused to respond to anything it found NSFW

Sil3ntKn1ght commented 5 months ago

I've been experimenting in my free time with these. If I'm able to find the time later in this week I'd be willing to start working on such a pull request. But they can be pretty heavy GPU wise. Phi 3 was pretty good too but very censored. It refused to respond to anything it found NSFW

oh oh that could be addressed with my feature request, the ability to and new line but from a txt file, with my training i have key words, i have to copy and past adding to wd14 created

ie , acrylic painting, vivid colours, scenery, brush marks etc etc

a add new line from imported txt would be handy by the sound for sfw where you want nsfw as you could import and add on from a single txt file.

C0nsumption commented 5 months ago

I've been experimenting in my free time with these. If I'm able to find the time later in this week I'd be willing to start working on such a pull request. But they can be pretty heavy GPU wise. Phi 3 was pretty good too but very censored. It refused to respond to anything it found NSFW

oh oh that could be addressed with my feature request, the ability to and new line but from a txt file, with my training i have key words, i have to copy and past adding to wd14 created

ie , acrylic painting, vivid colours, scenery, brush marks etc etc

a add new line from imported txt would be handy by the sound for sfw where you want nsfw as you could import and add on from a single txt file.

Agreed. But then it would have to blip 3. Blip 3 will hallucinate the best it can to describe the image. Phi3 has a strong safety filter built in to the weights where it will flat out refuse. Something like:

"I'm not comfortable responding to the request" or some shit like that.

I'll dig into the captioning section of OT next week and start doing some work. I may push the scirpts in a side repo and then just integrate. This way the features can at least be used before hand. Have to see the demand vs time contraints.

C0nsumption commented 5 months ago

An update for anyone interested in this as a feature:

Got comfortable with blip3 setup and have looked through the OneTrainer source. Will implement soon, just looking into CogVLM cause that was asked for as well. I'm looking into CogVLM2 specifically but if you want 1.0 please hit me here so that I know.

Sil3ntKn1ght commented 5 months ago

An update for anyone interested in this as a feature:

Got comfortable with blip3 setup and have looked through the OneTrainer source. Will implement soon, just looking into CogVLM cause that was asked for as well. I'm looking into CogVLM2 specifically but if you want 1.0 please hit me here so that I know.

i notice a bug on fresh install if you use blip 2 it will error, need to use blip first otherwise files are missing tp run blip 2.

also i have a text fill with text of, (oil painting, brush marks, wild colors, thick palette knife style, vivid) etc etc that i copy and past into at current 109 text files to each of my paintings, id love a option just to select that txt file so when i use wd14 to do the text it also injects the text from my txt file, saving me having to copy past repeatedly after running wd14, blip etc. might sound silly

also helpful with a character where i am doing this method also. hopefully that understandable dyslexic with grammar and spelling issues plus autism. appreciate anything that makes the creation of txt to each image image in my data set super easy. lone onetrianer because the ease of use.

O-J1 commented 1 month ago

We strongly, strongly recommend using Taggui. There is too much work in other areas to do for now. PRs are welcome