Tried to use same settings I find in Kohya but training failed epic

FurkanGozukara commented 6 months ago

Nerogar commented 5 months ago

This might be an aspect ratio bucketing problem. How many images did you train on, and are they square or rectangular? OneTrainer automatically generates buckets based on some pre defined aspect ratios. You can find the code here https://github.com/Nerogar/mgds/blob/master/src/mgds/pipelineModules/AspectBucketing.py

SDXL was originally trained on different resolutions. Maybe training it only for a few steps on these different buckets causes issues. To test this, you could try disabling aspect ratio bucketing. Or you could try to change the __create_buckets function to return the original buckets instead.

FurkanGozukara commented 5 months ago

This might be an aspect ratio bucketing problem. How many images did you train on, and are they square or rectangular? OneTrainer automatically generates buckets based on some pre defined aspect ratios. You can find the code here https://github.com/Nerogar/mgds/blob/master/src/mgds/pipelineModules/AspectBucketing.py

SDXL was originally trained on different resolutions. Maybe training it only for a few steps on these different buckets causes issues. To test this, you could try disabling aspect ratio bucketing. Or you could try to change the __create_buckets function to return the original buckets instead.

All images are 1024x1024. I will test without bucketing lets see

just trained without EMA same. actually without EMA it looks even worse :)

Nerogar commented 5 months ago

If all images are square already, aspect ratio bucketing is likely not the issue.

You set the weights of the entire model to float16, but the mixed precision data type to bfloat16. SDXL is pretty large, so setting the weights to float32 only works in some special cases. If you train only the unet and text encoder 1, it should barely fit. The train data type (which is used for mixed precision) can then be set to float16.

FurkanGozukara commented 5 months ago

If all images are square already, aspect ratio bucketing is likely not the issue.

You set the weights of the entire model to float16, but the mixed precision data type to bfloat16. SDXL is pretty large, so setting the weights to float32 only works in some special cases. If you train only the unet and text encoder 1, it should barely fit. The train data type (which is used for mixed precision) can then be set to float16.

yes i have a question regarding that

can you give more info about this panel?

i thought we set the base model weight type we set there. does it affect training? can you give more info? thank you

FurkanGozukara commented 5 months ago

started testing like this right now. we can't set different VAE file? it has to use embedded one?

Nerogar commented 5 months ago

can you give more info about this panel?

Sure. On the "Model" tab, you can set most of the data types that are used either during training, or when saving the model.

"Weight Data Type" is the overall data type used for the weights during training. The other types override that data type for specific parts of the model (like UNet or VAE). Trainable weights should always be set to float32. Other weights can be set to either float16 or bfloat16 to reduce VRAM.

For example, if you only train the UNet, you can set UNet to float32, then both text encoders to float16. The SDXL VAE doesn't like float16, it produces NaN results. So it should always be set to either float32 or bfloat16.

The Output Data Type is the type used when saving the output model. Float16 probably works in most situations. BFloat16 can reduce quality a bit.

we can't set different VAE file? it has to use embedded one?

At the moment, yes. I'm planning to add an additional input for that, similar to all the inputs for the würstchen model. But that might take a while. I'm busy with other things right now. One simple way to change the VAE at the moment is to load a model in diffusers format. There, each model part is saved in a single folder and can easily be replaced.

Btw, can you share your optimizer settings (the three dots on the training tab).

FurkanGozukara commented 5 months ago

@Nerogar ty for answers. sorry for late reply

here adafactor settings

in Kohya I am doing full BF16 training. it is using around 17 GB VRAM. Here one my example result i get

and i just did a new training with below settings

here the result of onetrainer

FurkanGozukara commented 5 months ago

ok i found out that in kohya adafactor settings different that i use

scale and relative are false

testing again i dont know other parameters since we only also set weight decay and we dont set other settings such as EPS beta decay rate and clip threshold

FurkanGozukara commented 5 months ago

ok those 2 changes made huge difference

but at 60 epoch very undertrained

i will do more epoch and test later

Nerogar commented 5 months ago

Looks like you already found the problem. In any case, here is a bit more information: If "Scale parameter" and "Relative step" is enabled, the optimizer doesn't use the set learning rate. instead, it only uses that as the initial learning rate and adjusts it automatically. Those settings were broken until now. I just pushed a fix. So the two options you have are this:

Set "Scale parameter" and "Relative step" to False
"Scale parameter" and "Relative step" to True, in this case you should use the Adafactor learning rate scheduler to automatically calculate the learning rate.

FurkanGozukara commented 5 months ago

Looks like you already found the problem. In any case, here is a bit more information: If "Scale parameter" and "Relative step" is enabled, the optimizer doesn't use the set learning rate. instead, it only uses that as the initial learning rate and adjusts it automatically. Those settings were broken until now. I just pushed a fix. So the two options you have are this:

Set "Scale parameter" and "Relative step" to False

"Scale parameter" and "Relative step" to True, in this case you should use the Adafactor learning rate scheduler to automatically calculate the learning rate.

Correct. setting them false made huge diff. after longer training i can give more feedback but i felt like model lost so much information and still undertraining.

on kohya i use regularization images as well. ground truth ones. i suppose you don't have DreamBooth method for that?

Also OneTrainer uses 22 GB VRAM while Kohya uses 17 GB

Nerogar commented 5 months ago

on kohya i use regularization images as well. ground truth ones. i suppose you don't have DreamBooth method for that?

There is no specific setting for regularization images. But you can create a second concept, add your regularization images, then set the repeats for that concept really low. Something like 0.01, depending on the amount of images you have.

Also OneTrainer uses 22 GB VRAM while Kohya uses 17 GB

That is really strange. I'm doing a run with the same settings right now, and it uses less than 15GB.

Here are my settings

The only obvious difference I see is the EMA option. But if you use EMA on CPU, it should not change VRAM usage.

FurkanGozukara commented 5 months ago

now that is really weird. I mean the VRAM difference

are you on windows 10 ? I am windows 10 and Python 3.10.11

what would be the effect of 0.01 repeat? I mean how does it work? as second concept

FurkanGozukara commented 5 months ago

by the way i am doing fine tune not lora

Nerogar commented 5 months ago

are you on windows 10 ? I am windows 10 and Python 3.10.11

I'm using WIndows 10 with Python 3.10.8

what would be the effect of 0.01 repeat? I mean how does it work? as second concept

With this, each epoch only trains on 1% of images from that concept. The images are randomly selected each epoch, but don't repeat until every image is trained once. An example:

4 Images with repeats set to 0.5: epoch 1: images 1 and 3 epoch 2: images 2 and 4 epoch 3: images 3 and 4 epoch 4: images 1 and 2 etc....

by the way i am doing fine tune not lora

I did the same. LoRA rank 16 with the same settings uses a bit under 9GB

It's really hard to debug this, but maybe you can start by copying my settings exactly

FurkanGozukara commented 5 months ago

@Nerogar yes i am testing yours same here my config that still uses whole available VRAM

I may try different pytorch and xformers perhaps

python scripts/train.py --training-method="FINE_TUNE" --model-type="STABLE_DIFFUSION_XL_10_BASE" --debug-dir="debug" --workspace-dir="G:/OneTrainer/workspace" --cache-dir="G:/OneTrainer/cache" --base-model-name="F:/0 models/sd_xl_base_1.0.safetensors" --weight-dtype="BFLOAT_16" --output-dtype="BFLOAT_16" --output-model-format="SAFETENSORS" --output-model-destination="G:/automatic1111/stable-diffusion-webui/models/Stable-diffusion/one_train" --gradient-checkpointing --concept-file-name="training_concepts/concepts.json" --latent-caching --clear-cache-before-training --learning-rate-scheduler="CONSTANT" --learning-rate="1e-05" --learning-rate-warmup-steps="200" --learning-rate-cycles="1" --epochs="150" --batch-size="1" --gradient-accumulation-steps="1" --ema="OFF" --ema-decay="0.999" --ema-update-step-interval="5" --train-device="cuda" --temp-device="cpu" --train-dtype="BFLOAT_16" --resolution="1024" --attention-mechanism="XFORMERS" --align-prop-probability="0.1" --align-prop-loss="AESTHETIC" --align-prop-weight="0.01" --align-prop-steps="20" --align-prop-truncate-steps="0.5" --align-prop-cfg-scale="7.0" --mse-strength="1.0" --mae-strength="0.0" --loss-scaler="NONE" --learning-rate-scaler="NONE" --train-unet --train-unet-epochs="100000" --unet-learning-rate="1e-05" --offset-noise-weight="0.0" --perturbation-noise-weight="0.0" --max-noising-strength="1.0" --unet-weight-dtype="BFLOAT_16" --train-prior --train-prior-epochs="10000" --prior-weight-dtype="NONE" --train-text-encoder --train-text-encoder-epochs="150" --text-encoder-learning-rate="3e-06" --text-encoder-layer-skip="0" --text-encoder-weight-dtype="BFLOAT_16" --train-text-encoder-2-epochs="30" --text-encoder-2-layer-skip="0" --text-encoder-2-weight-dtype="BFLOAT_16" --vae-weight-dtype="BFLOAT_16" --effnet-encoder-model-name="" --effnet-encoder-weight-dtype="NONE" --decoder-model-name="" --decoder-weight-dtype="NONE" --decoder-text-encoder-weight-dtype="NONE" --decoder-vqgan-weight-dtype="NONE" --unmasked-probability="0.1" --unmasked-weight="0.1" --token-count="1" --initial-embedding-text="*" --embedding-weight-dtype="FLOAT_32" --lora-model-name="" --lora-rank="16" --lora-alpha="1.0" --lora-weight-dtype="FLOAT_32" --optimizer="ADAFACTOR" --optimizer-adam-w-mode="False" --optimizer-amsgrad="False" --optimizer-beta1="0.9" --optimizer-beta2="0.999" --optimizer-bias-correction="False" --optimizer-block-wise="False" --optimizer-capturable="False" --optimizer-centered="False" --optimizer-clip-threshold="1.0" --optimizer-decay-rate="-0.8" --optimizer-decouple="False" --optimizer-differentiable="False" --optimizer-eps="1e-30" --optimizer-eps2="0.001" --optimizer-foreach="False" --optimizer-fsdp-in-use="False" --optimizer-fused="True" --optimizer-is-paged="False" --optimizer-maximize="False" --optimizer-nesterov="False" --optimizer-no-prox="False" --optimizer-relative-step="False" --optimizer-safeguard-warmup="False" --optimizer-scale-parameter="False" --optimizer-use-bias-correction="False" --optimizer-use-triton="False" --optimizer-warmup-init="False" --optimizer-weight-decay="0.01" --sample-definition-file-name="training_samples/samples.json" --sample-after="10" --sample-after-unit="NEVER" --sample-image-format="JPG" --samples-to-tensorboard --non-ema-sampling --backup-after="30" --backup-after-unit="NEVER" --rolling-backup-count="3" --save-after="30" --save-after-unit="EPOCH"

Nerogar commented 5 months ago

You set the beta1 of AdaFactor to 0.9. Setting it to "None" reduces VRAM a lot. I'm not familiar with Kohya, so I don't know what settings are used there. There is currently a bug in the optimizers UI that doesn't set the beta1 to the default value "None" if you click the "load defaults" button.

Nerogar commented 5 months ago

There is currently a bug in the optimizers UI that doesn't set the beta1 to the default value "None" if you click the "load defaults" button.

This is now fixed. The default of "None" is correctly restored when switching optimizers of clicking the "load defaults" button.

FurkanGozukara commented 5 months ago

There is currently a bug in the optimizers UI that doesn't set the beta1 to the default value "None" if you click the "load defaults" button.

This is now fixed. The default of "None" is correctly restored when switching optimizers of clicking the "load defaults" button.

wow it made such a dramatic effect

what does it do? now uses like 13 GB vram

Nerogar commented 5 months ago

Adafactor has an option to calculate EMA weights itself. if you set the beta1 to something other than None, it will create a full copy of the model in vram. I'm not exactly sure how these EMA weights are used. It's probably explained in the original paper https://arxiv.org/abs/1804.04235

FurkanGozukara commented 5 months ago

Adafactor has an option to calculate EMA weights itself. if you set the beta1 to something other than None, it will create a full copy of the model in vram. I'm not exactly sure how these EMA weights are used. It's probably explained in the original paper https://arxiv.org/abs/1804.04235

thanks a lot

started a full training. will go to the gym. i hope i can get better results than kohya

currently uses lesser VRAM than Kohya.

not using xformers, using EMA on CPU, full training bf16

uses around 15-15.5 GB

speed is a little bit slow but ok 2.86 second it.

also do you suggest to keep EMA update steps interval 5 as set default on GUI?

FurkanGozukara commented 5 months ago

wow even EMA on gpu works without xformers

uses around 20 gb speed is great

1.72 second / it

Nerogar commented 5 months ago

also do you suggest to keep EMA update steps interval 5 as set default on GUI?

The quality will be best if you set it to 1, but that will also be a bit slower. Especially if EMA is set to CPU. On GPU, the difference is very small, so you can safely set it to 1.

For GPU I would recommend 1
For CPU I would recommend setting it as low as you can tolerate

FurkanGozukara commented 5 months ago

also do you suggest to keep EMA update steps interval 5 as set default on GUI?

The quality will be best if you set it to 1, but that will also be a bit slower. Especially if EMA is set to CPU. On GPU, the difference is very small, so you can safely set it to 1.

For GPU I would recommend 1

For CPU I would recommend setting it as low as you can tolerate

awesome ty

did set it 1

1.75 second / it. now time to train and see results

FurkanGozukara commented 5 months ago

270 epochs still super undertrained

some other parameter still should be different than kohya

i wonder how could i debug

on kohya we don't set these

Nerogar commented 5 months ago

All of those are the default optimizer parameters. Unless Kohya added some other hard coded values, they should be the same. But there are probably some other differences.

For example EMA. I just looked through Kohyas code for a few minutes and couldn't find any mention of that. Is if even supported? And if it is, maybe it's implementation is different?

Then there is the different regularization images implementation. Does Kohya add a special loss weighting to those images? you can configure that in each concept.

Timestep based loss weighting (like min-snr-gamma or debiased loss) is another difference. OneTrainer doesn't support those yet.

Those are just a few examples, but there are probably more.

FurkanGozukara commented 5 months ago

@Nerogar true they lack EMA. that is one very big advantage of OneTrainer

when I don't use reg images still working great with Kohya much better resemblance

I dont use Debiased Estimation loss or min-snr-gamma

looks like i have to do more learning rate test

the sad part is i do such batch testing on runpod and gui will not work there

so i will generate command on my computer and test on runpod

the parameters i use with kohya

--max_data_loader_n_workers="0" --learning_rate_te1="3e-06" --learning_rate_te2="0.0" --learning_rate="1e-05" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="6000" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --save_every_n_steps="901" --gradient_checkpointing --bucket_no_upscale --noise_offset=0.0 --max_grad_norm=0.0 --no_half_vae --train_text_encoder

Nerogar commented 5 months ago

true they lack EMA. that is one very big advantage of OneTrainer

That could already make a big difference. Did you use EMA in this last run? EMA is usually only needed if you train many different things at the same time. For single concept runs like this, it could slow progress down a lot, because it always retains some parts of the original model. Your "ohwx" token for example will train normally, but at every step it will be averaged with the previous values, slowing down progress.

One suggestion to test this: On the samples tab you can specify samples that are generated every x minutes, steps or epochs (also configurable). They will sample the EMA and non-EMA versions of the model, so you can compare them. If you save regular backups, or at least one backup before the final save, you can then extract the EMA and the non-EMA versions of the model later. But to do that you will need to manually move some files and use the model conversion tool on the tools tab.

FurkanGozukara commented 5 months ago

true they lack EMA. that is one very big advantage of OneTrainer

That could already make a big difference. Did you use EMA in this last run? EMA is usually only needed if you train many different things at the same time. For single concept runs like this, it could slow progress down a lot, because it always retains some parts of the original model. Your "ohwx" token for example will train normally, but at every step it will be averaged with the previous values, slowing down progress.

One suggestion to test this: On the samples tab you can specify samples that are generated every x minutes, steps or epochs (also configurable). They will sample the EMA and non-EMA versions of the model, so you can compare them. If you save regular backups, or at least one backup before the final save, you can then extract the EMA and the non-EMA versions of the model later. But to do that you will need to manually move some files and use the model conversion tool on the tools tab.

thanks will do this. EMA was hugely improving on SD 1.5 training back in days when using DreamBooth extension of Automatic1111

FurkanGozukara commented 5 months ago

there is a dramatic difference between EMA and no EMA cant make much sense

Tr1dae commented 5 months ago

commenting because I REALLY want to see this thread come to a solution.

@FurkanGozukara I subbed to your patreon and your Kohya settings have given me amazing results. I've been trying my best to replicate it in OneTrainer because the GUI is just so much nicer to work with.

It excites me a great deal that you've been trying to figure out good OneTrainer settings to match your Kohya

There's not much I can add besides saying that I too have tried copying the Kohya settings over and have yet to get such perfect results.

Nerogar commented 5 months ago

there is a dramatic difference between EMA and no EMA cant make much sense

The preview is pretty small, so I can't see it clearly. Is the no ema sample closer to your expected result? The difference between ema and non ema looks correct. In the beginning, ema moves quickly, following the non ema. After some steps, the decay value increases, slowing down the updates to the model. If you want to speed up the ema training, you can decrease the ema decay setting. A setting of 0.99 means at each step, the ema model is calculated as (old_ema*0.99)+(non_ema*0.01)

FurkanGozukara commented 5 months ago

commenting because I REALLY want to see this thread come to a solution.

@FurkanGozukara I subbed to your patreon and your Kohya settings have given me amazing results. I've been trying my best to replicate it in OneTrainer because the GUI is just so much nicer to work with.

It excites me a great deal that you've been trying to figure out good OneTrainer settings to match your Kohya

There's not much I can add besides saying that I too have tried copying the Kohya settings over and have yet to get such perfect results.

thank you so much. yes i am still very far behind of kohya. trying to figure out differences

FurkanGozukara commented 5 months ago

there is a dramatic difference between EMA and no EMA cant make much sense

The preview is pretty small, so I can't see it clearly. Is the no ema sample closer to your expected result? The difference between ema and non ema looks correct. In the beginning, ema moves quickly, following the non ema. After some steps, the decay value increases, slowing down the updates to the model. If you want to speed up the ema training, you can decrease the ema decay setting. A setting of 0.99 means at each step, the ema model is calculated as (old_ema*0.99)+(non_ema*0.01)

the effect like that

yesterday i tried much higher learning rate 5 times bigger 5e-5

trained up to 500 epochs - which is huge. I will try even higher LR with EMA but finding accurate LR takes huge time. I really need to run multiple trainings on RunPod

non ema cooked but in a very weird way

ema is not cooked but similarity is still off. if you add save both EMA and NonEma safetensors option that would be amazing to compare

here EMA 480 epoch result. used afterdetailer as well. LR is 5 times bigger of what I use in Kohya

here non EMA samples generated during training

FurkanGozukara commented 5 months ago

the thing I noticed is even though I do more epochs, the EMA model changes too little as you said @Nerogar even though using a very high LR for SDXL

so if i can find good parameters with EMA I think we might get better results than non EMA what you think?

Here checkpoint comparison

xyz_grid-0000-1924626384

hameerabbasi commented 5 months ago

there is a dramatic difference between EMA and no EMA cant make much sense

The preview is pretty small, so I can't see it clearly. Is the no ema sample closer to your expected result? The difference between ema and non ema looks correct. In the beginning, ema moves quickly, following the non ema. After some steps, the decay value increases, slowing down the updates to the model. If you want to speed up the ema training, you can decrease the ema decay setting. A setting of 0.99 means at each step, the ema model is calculated as (old_ema*0.99)+(non_ema*0.01)

the effect like that

yesterday i tried much higher learning rate 5 times bigger 5e-5

trained up to 500 epochs - which is huge. I will try even higher LR with EMA but finding accurate LR takes huge time. I really need to run multiple trainings on RunPod

non ema cooked but in a very weird way

ema is not cooked but similarity is still off. if you add save both EMA and NonEma safetensors option that would be amazing to compare

here EMA 480 epoch result. used afterdetailer as well. LR is 5 times bigger of what I use in Kohya

here non EMA samples generated during training

Looks like aspect ratio bucketing is missing here.

FurkanGozukara commented 5 months ago

@hameerabbasi all are 1024x1024 pixel images in training set. i posted in above messages

Nerogar commented 5 months ago

I can't really speak about the quality of the results. I don't have much experience with that type of training. But from a technical point of view, there should be absolutely no difference between the training of OneTrainer and Kohya. They are very different in how they manage the training data, UI, VRAM etc. But the actual training loop is exactly the same. So in theory, the same parameters should work with both applications. There has to be some kind of difference elsewhere. It's also entirely possible there is still a bug somewhere in OneTrainer that degrades results.

One very strange thing I notice in your non-EMA samples. Some images look pretty clear, then the next one is very warped or doesn't even contain a human anymore. For example, epoch 335 looks pretty good. Then the next sample at 340 is completely broken

Also, samples don't seem to converge. They jump around a lot. 290 is a closeup photo, 295 a full body photo, then 300 looks similar to 290 again. To me this suggests there is something very unstable happening.

As a test, can you set the text encoder 1 data type to float32 instead of bfloat16? it will increase memory consumption, but might lead to more stable training.

FurkanGozukara commented 5 months ago

@Nerogar i agree i should get same. but results so far really really different.

I do full BF16 training on Kohya without xformers.

uses around 17 gb. will test text encoder fp32 too as well. will not use EMA. lets see if i will get similar results

FurkanGozukara commented 5 months ago

and i did another training with higher LR

1e-4

FurkanGozukara commented 5 months ago

finally got really good results almost as Kohya

Didn't use EMA since it requires whole new level of LR testing

currently used same settings of Kohya

you can download and see full resolution

new_me is Kohya with reg images other ones are OneTrainer

xyz_grid-0000-167590038

Nerogar commented 5 months ago

Those results look pretty nice. But I see a lot of over fitting in the background and clothing. Masked training can reduce that a bit, if you set the parameters right. But that might also require some changes to other parameters.

FurkanGozukara commented 5 months ago

Those results look pretty nice. But I see a lot of over fitting in the background and clothing. Masked training can reduce that a bit, if you set the parameters right. But that might also require some changes to other parameters.

can you give me some hints to try them ?

FurkanGozukara commented 5 months ago

tested an anime prompt one trainer more accurate

Nerogar commented 5 months ago

I've never tried training a single person with SDXL, but here are some hints that should be a good starting point:

Create masks using the dataset tool on the tools tab. You can do that on basically any PC, it doesn't need a powerful GPU. Just enable mask editing and draw onto the image. Press enter to save the mask, arrow keys to switch to the next/previous image. Only mask the head and a small area around the head. This will generate additional images in the dataset folder.
On the train tab, enable masked training.

For the parameters, I would suggest trying the defaults first. You might need to increase the unmasked weight or unmasked probability a bit, if the result doesn't look like a human anymore. Those two options will increase the training impact of the non masked parts of the image, so they can also lead to a bit more over fitting.

You can try the "normalize masked area loss" option if you want, but it might not be needed in your case. It normalizes the loss, so it's independent from the size of the area you masked. But since most of your images are similar, it won't do a lot.

Setting the max noising strength to 0.8 or 0.85 can reduce over fitting again. This setting stops the training from fully noising the image. It is needed to stop the model from learning your exact image composition from the train images.

FurkanGozukara commented 5 months ago

yesterday I tested default preset of SDXL

still need a lot of experimentation to make it better then I should make a full tutorial

@Nerogar meanwhile if you could make this change would be awesome > https://github.com/Nerogar/OneTrainer/issues/115#issuecomment-1878464825

we give output folder path and out name. checkpoints are all saved there with epoch + given name format

used prompt is photo of ohwx man wearing an expensive suit in a studio

Adetailer is used as well

comparison

Nerogar commented 5 months ago

One note about the preset you shared. If you changed any of the optimizer settings, they are not loaded from that preset. This is something I want to change in the future. But at the moment, they are saved in a different location.

FurkanGozukara commented 5 months ago

One note about the preset you shared. If you changed any of the optimizer settings, they are not loaded from that preset. This is something I want to change in the future. But at the moment, they are saved in a different location.

I did like this

loaded your preset then only copy pasted my folders

so wouldnt that be default?

Nerogar commented 5 months ago

Optimizer settings are saved in training_user_settings/optimizer_prefs.json If you include this file, everything should work

FurkanGozukara commented 5 months ago

Optimizer settings are saved in training_user_settings/optimizer_prefs.json If you include this file, everything should work

when preset loaded it is like this

Nerogar / OneTrainer

Tried to use same settings I find in Kohya but training failed epic #116