Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.62k stars 129 forks source link

[Feat]: Is this feature available:custom Max Token Length #430

Open ja1496 opened 1 month ago

ja1496 commented 1 month ago

Describe your use-case.

I want to customize the maximum token length because some of the prompt words corresponding to my images are too long. Can I add this feature?

What would you like to see as a solution?

kohya_ss

Have you considered alternatives? List them here.

No response

keclee commented 1 month ago

This would be great. There was a whitepaper on DALL E 3 where they used descriptive synthetic captions to improve model. Default 75 tokens is quite limiting in this regard.

celll1 commented 1 week ago

I have created code in this fork (https://github.com/celll1/OneTrainer/tree/dev) that supports token lengths of up to (75 tokens x) 3 chunks for the Text Encoder. It has been confirmed to work with SDXL LoRA.

Please note that the handling of BOS/EOS tokens differs in the implementation of sd-scripts. I am not confident about whether or not an attention mask should be applied.

O-J1 commented 1 week ago

I have created code in this fork (https://github.com/celll1/OneTrainer/tree/dev) that supports token lengths of up to (75 tokens x) 3 chunks for the Text Encoder. It has been confirmed to work with SDXL LoRA.

Please note that the handling of BOS/EOS tokens differs in the implementation of sd-scripts. I am not confident about whether or not an attention mask should be applied.

Hop on the discord and ask Nerogar, once youve discussed it with him and hes reviewed (and it comes behind a flag/checkbox) I am confident he would accept a PR.