bmaltais / kohya_ss

Apache License 2.0
9.63k stars 1.24k forks source link

**Important (Timestep Issue) #2797

Open DriveHabits opened 2 months ago

DriveHabits commented 2 months ago

I noticed something quite odd but it seemed to fix my issue. I was doing style training.

  1. You had a typo for flux shift in the timestep sampling on a previous commit that you then fixed.

  2. I did a training when it had that issue and I noticed my style was working. I'm not sure what the behavior of having that typo in, idk if flux was even timestep sampling at all, but I had better results with that typo.

Now for the question, is it possible to implement and option where we can disable timestep sampling or having something similar to the typo version?

Thank you!

DarkViewAI commented 2 months ago

I actually noticed this as well. Seems the typo version had better results

bmaltais commented 2 months ago

Make sure to use the same seed for the training. This would help maintain training results consistency. The training seed can have a similar effect on the end result as the seed has on prompt outputs.

DriveHabits commented 2 months ago

Make sure to use the same seed for the training. This would help maintain training results consistency. The training seed can have a similar effect on the end result as the seed has on prompt outputs.

Yep i made sure all was the same, i've ran 40+ trainings to confirm

DriveHabits commented 2 months ago

i used training seed of 42 for all. I wonder what the typo did differently that made the results better

DriveHabits commented 2 months ago
cxcx

loaded up the working config, in the toml it shows up as the typo "flux_shift "

tried a training on it, works great.

then tried one with the actual fixed commit and training results are not even close.

bmaltais commented 2 months ago

I believe the default value is the one used when the one specified is not valid… So it probably was defaulting to “shift” instead.

bmaltais commented 2 months ago

Actually, the default is “sigma”

DriveHabits commented 2 months ago

@bmaltais even when its passed through the toml? it defaults to sigma?

bmaltais commented 2 months ago

I suspect so

DriveHabits commented 2 months ago

I suspect so

@bmaltais i tried sigma with the same settings, results are quite bad

bmaltais commented 2 months ago

Him… this is strange as the sd-scripts will use that if not specified… but maybe if a “bad” value is specified it a toml config it default to something else? Maybe @kohya-ss could tell us what happens in that case.

DriveHabits commented 2 months ago

@bmaltais here is the config.toml with the typo it had https://huggingface.co/DriveHabits/Test/blob/main/config_lora-20240901-142242.toml

kohya-ss commented 2 months ago

If flux_shift (with an extra space) is specified, it will be sigma. If you train with sigma, you should get the same condition.

If something is different with the same seed, it may be due to other script changes, for example --network_train_unet_only now works.

DarkViewAI commented 2 months ago

If flux_shift (with an extra space) is specified, it will be sigma. If you train with sigma, you should get the same condition.

If something is different with the same seed, it may be due to other script changes, for example --network_train_unet_only now works.

and what about the discrete flow shift? when i used the flux shift with the extra space, i used 3.1582? would i still use the same or does sigma use something else?

DarkViewAI commented 2 months ago

@kohya-ss thanks

kohya-ss commented 2 months ago

and what about the discrete flow shift? when i used the flux shift with the extra space, i used 3.1582? would i still use the same or does sigma use something else?

Discrete flow shift is ignored when timestep_sampling is other than shift. So it is ignored in sigma or flow_shift.

DarkViewAI commented 2 months ago

and what about the discrete flow shift? when i used the flux shift with the extra space, i used 3.1582? would i still use the same or does sigma use something else?

Discrete flow shift is ignored when timestep_sampling is other than shift. So it is ignored in sigma or flow_shift.

thank you will try it out and see if I get something similar with sigma