Open ja1496 opened 1 month ago
This would be great. There was a whitepaper on DALL E 3 where they used descriptive synthetic captions to improve model. Default 75 tokens is quite limiting in this regard.
I have created code in this fork (https://github.com/celll1/OneTrainer/tree/dev) that supports token lengths of up to (75 tokens x) 3 chunks for the Text Encoder. It has been confirmed to work with SDXL LoRA.
Please note that the handling of BOS/EOS tokens differs in the implementation of sd-scripts. I am not confident about whether or not an attention mask should be applied.
I have created code in this fork (https://github.com/celll1/OneTrainer/tree/dev) that supports token lengths of up to (75 tokens x) 3 chunks for the Text Encoder. It has been confirmed to work with SDXL LoRA.
Please note that the handling of BOS/EOS tokens differs in the implementation of sd-scripts. I am not confident about whether or not an attention mask should be applied.
Hop on the discord and ask Nerogar, once youve discussed it with him and hes reviewed (and it comes behind a flag/checkbox) I am confident he would accept a PR.
Describe your use-case.
I want to customize the maximum token length because some of the prompt words corresponding to my images are too long. Can I add this feature?
What would you like to see as a solution?
kohya_ss
Have you considered alternatives? List them here.
No response