Open orcinus opened 2 months ago
Will test with "include": false too in a few hours - need to finish current finetune run first.
Include should not be changed for sdxl. It's only useful for models where one of the text encoders is optional like SD3. I've never seen anything suggesting disabled text encoder training doesn't work. And you can easily see that in the code by checking if the text encoder is even loaded into vram.
Is it possible you changed the dtype of the text encoders, and therefore see slight differences in inference?
@Nerogar yeah, i've checked the code after i posted this
@hameerabbasi entirely possible, just realized that this morning, but haven't had a chance to test completely yet (another test run in progress, will check after)... omitting dtype from TE config still produces the same result, but i still have output_dtype set to float32, so that's likely the cause.
For what its worth, the differences in inference weren't slight, they were pretty significant (pose, style, background, everything) and got larger the farther along into the training i tested, that's why it genuinely seemed like a training difference, rather than just a small change caused by dtype.
Okay. Fine tuning original HF repo sdxl safetensors.
Output .safetensors still inferences differently with its CLIP vs. original SDXL CLIP.
What am i doing wrong here?
Okay. Fine tuning original HF repo sdxl safetensors.
- output_dtype set to FLOAT16
- train_dtype set to FLOAT32
- unet weight_dtype set to FLOAT32
- TE1 and TE2 weight_dtype set explicitly to FLOAT16, train: false, include: false
Output .safetensors still inferences differently with its CLIP vs. original SDXL CLIP.
What am i doing wrong here?
Can you provide some inference examples?
Can you provide some inference examples?
In about a week or two. I'm away on a trip, and despite leaving a VPN tunnel open to my ML gear back home, the friggin Wireshark server died after i left -_-
- output_dtype set to FLOAT16
This actually might be the problem -- try FLOAT32
.
This actually might be the problem -- try FLOAT32.
But the original model is FLOAT16. The objective is to:
Can you provide some inference examples?
In about a week or two. I'm away on a trip, and despite leaving a VPN tunnel open to my ML gear back home, the friggin Wireshark server died after i left -_-
@orcinus bump for followup
Ahh, sorry, got back, and straight into complete chaos at work. Completely dropped the ball on this.
Will make something as soon as possible.
What happened?
Configuring TEs as follows:
... the TEs do not get frozen during finetune, and get trained regardless. Easily verifiable by diffing TEs from original model vs. finetuned, or just hooking up original CLIP vs. trained in ComfyUI and comparing inferences (they'll be different).
Assuming this is my mistake, and i've configured OneTrainer wrong - i.e. i also need to specify the "include": false parameter - why is that the case, and why are there two parameters for this?
What did you expect would happen?
TEs should be frozen and remain unchanged during finetune.
Relevant log output
Output of
pip freeze