After upgrading and rerunning the setup, Flux finetuning from 14s/it into 164s/it

windows 10: RTX 3090 24GB, running kohya locally for FLux dream. I followed SECourses video to configure kohya. The last time I used Kohya was sept 1 2024. Now it is Okt 1 2024. First I ran Kohya without updating: the performance was 14s/it. Then I cancelled training. I ran git pull, reran the setup, configured the accelerator. And now the performance is 130-160s/it.

I didn't touch most of the parameters. Most config comes from a pre-made config file. What am I doing wrong?

(venv) PS D:\Kohya_GUI_Flux_Installer_21\kohya_ss> git status
On branch sd3-flux.1
Your branch is up to date with 'origin/sd3-flux.1'.

adaptive_noise_scale = 0
ae = "D:/ComfyUI_windows_portable/ComfyUI/models/vae/ae.safetensors"
blocks_to_swap = 0
bucket_no_upscale = true
bucket_reso_steps = 64
cache_latents = true
cache_latents_to_disk = true
cache_text_encoder_outputs = true
cache_text_encoder_outputs_to_disk = true
caption_dropout_every_n_epochs = 0
caption_dropout_rate = 0
caption_extension = ".txt"
clip_l = "D:/ComfyUI_windows_portable/ComfyUI/models/clip/clip_l.safetensors"
cpu_offload_checkpointing = true
discrete_flow_shift = 3.1582
double_blocks_to_swap = 5
dynamo_backend = "no"
epoch = 15
full_bf16 = true
fused_backward_pass = true
gradient_accumulation_steps = 1
gradient_checkpointing = true
guidance_scale = 1
huber_c = 0.1
huber_schedule = "snr"
keep_tokens = 0
learning_rate = 4e-6
learning_rate_te = 0
logging_dir = "D:/Kohya_GUI_Flux_Installer_21/train_4\\log"
loss_type = "l2"
lr_scheduler = "constant"
lr_scheduler_args = []
lr_scheduler_num_cycles = 1
lr_scheduler_power = 1
lr_warmup_steps = 0
max_bucket_reso = 2048
max_data_loader_n_workers = 0
max_timestep = 1000
max_token_length = 75
max_train_steps = 3300
mem_eff_save = true
min_bucket_reso = 256
mixed_precision = "bf16"
model_prediction_type = "raw"
multires_noise_discount = 0.3
multires_noise_iterations = 0
noise_offset = 0
noise_offset_type = "Original"
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False", "weight_decay=0.01",]
optimizer_type = "Adafactor"
output_dir = "D:/Kohya_GUI_Flux_Installer_21/train_4\\model"
output_name = "alr_p"
persistent_data_loader_workers = 0
pretrained_model_name_or_path = "D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005.safetensors"
prior_loss_weight = 1
resolution = "1024,1024"
resume = "D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005-state"
sample_prompts = "D:/Kohya_GUI_Flux_Installer_21/train_4\\model\\sample/prompt.txt"
sample_sampler = "euler_a"
save_every_n_epochs = 3
save_model_as = "safetensors"
save_precision = "fp16"
save_state = true
save_state_on_train_end = true
sdpa = true
seed = 1
single_blocks_to_swap = 0
t5xxl = "D:/ComfyUI_windows_portable/ComfyUI/models/clip/t5xxl_fp16.safetensors"
t5xxl_max_token_length = 512
timestep_sampling = "sigmoid"
train_batch_size = 1
train_blocks = "all"
train_data_dir = "D:/Kohya_GUI_Flux_Installer_21/train_4\\img"
vae_batch_size = 4
wandb_run_name = "alr_p"

(venv) PS D:\Kohya_GUI_Flux_Installer_21\kohya_ss> pip list
Package                      Version                 Editable project location
---------------------------- ----------------------- --------------------------------------------------
absl-py                      2.1.0
accelerate                   0.33.0
aiofiles                     23.2.1
aiohappyeyeballs             2.4.3
aiohttp                      3.10.8
aiosignal                    1.3.1
altair                       4.2.2
annotated-types              0.7.0
antlr4-python3-runtime       4.9.3
anyio                        4.6.0
appdirs                      1.4.4
astunparse                   1.6.3
async-timeout                4.0.3
attrs                        24.2.0
bitsandbytes                 0.44.0
certifi                      2024.8.30
charset-normalizer           3.3.2
click                        8.1.7
colorama                     0.4.6
coloredlogs                  15.0.1
contourpy                    1.3.0
cycler                       0.12.1
dadaptation                  3.2
diffusers                    0.25.0
docker-pycreds               0.4.0
easygui                      0.98.3
einops                       0.7.0
entrypoints                  0.4
exceptiongroup               1.2.2
fairscale                    0.4.13
fastapi                      0.112.4
ffmpy                        0.4.0
filelock                     3.16.1
flatbuffers                  24.3.25
fonttools                    4.54.1
frozenlist                   1.4.1
fsspec                       2024.9.0
ftfy                         6.1.1
gast                         0.6.0
gitdb                        4.0.11
GitPython                    3.1.43
google-pasta                 0.2.0
gradio                       4.43.0
gradio_client                1.3.0
grpcio                       1.66.2
h11                          0.14.0
h5py                         3.12.1
httpcore                     1.0.5
httpx                        0.27.2
huggingface-hub              0.24.5
humanfriendly                10.0
idna                         3.10
imagesize                    1.4.1
importlib_metadata           8.5.0
importlib_resources          6.4.5
invisible-watermark          0.2.0
Jinja2                       3.1.4
jsonschema                   4.23.0
jsonschema-specifications    2023.12.1
keras                        3.5.0
kiwisolver                   1.4.7
libclang                     18.1.1
library                      0.0.0                   d:\kohya_gui_flux_installer_21\kohya_ss\sd-scripts
lightning-utilities          0.11.7
lion-pytorch                 0.0.6
lycoris-lora                 2.2.0.post3
Markdown                     3.7
markdown-it-py               3.0.0
MarkupSafe                   2.1.5
matplotlib                   3.9.2
mdurl                        0.1.2
ml-dtypes                    0.4.1
mpmath                       1.3.0
multidict                    6.1.0
namex                        0.0.8
networkx                     3.3
numpy                        1.26.4
omegaconf                    2.3.0
onnx                         1.16.1
onnxruntime-gpu              1.17.1
open-clip-torch              2.20.0
opencv-python                4.10.0.84
opt_einsum                   3.4.0
optree                       0.12.1
orjson                       3.10.7
packaging                    24.1
pandas                       2.2.3
pathtools                    0.1.2
pillow                       10.4.0
pip                          23.0.1
platformdirs                 4.3.6
prodigyopt                   1.0
protobuf                     3.20.3
psutil                       6.0.0
pydantic                     2.9.2
pydantic_core                2.23.4
pydub                        0.25.1
Pygments                     2.18.0
pyparsing                    3.1.4
pyreadline3                  3.5.4
python-dateutil              2.9.0.post0
python-multipart             0.0.12
pytorch-lightning            1.9.0
pytz                         2024.2
PyWavelets                   1.7.0
PyYAML                       6.0.2
referencing                  0.35.1
regex                        2024.9.11
requests                     2.32.3
rich                         13.8.1
rpds-py                      0.20.0
ruff                         0.6.8
safetensors                  0.4.4
schedulefree                 1.2.7
scipy                        1.11.4
semantic-version             2.10.0
sentencepiece                0.2.0
sentry-sdk                   2.14.0
setproctitle                 1.3.3
setuptools                   65.5.0
shellingham                  1.5.4
six                          1.16.0
smmap                        5.0.1
sniffio                      1.3.1
starlette                    0.38.6
sympy                        1.13.1
tensorboard                  2.17.1
tensorboard-data-server      0.7.2
tensorflow                   2.17.0
tensorflow-intel             2.17.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor                    2.4.0
timm                         0.6.12
tk                           0.1.0
tokenizers                   0.19.1
toml                         0.10.2
tomlkit                      0.12.0
toolz                        0.12.1
torch                        2.4.1+cu124
torchaudio                   2.5.0.dev20240930+cu124
torchmetrics                 1.4.2
torchvision                  0.19.1+cu124
tqdm                         4.66.5
transformers                 4.44.2
typer                        0.12.5
typing_extensions            4.12.2
tzdata                       2024.2
urllib3                      2.2.3
uvicorn                      0.31.0
voluptuous                   0.13.1
wandb                        0.18.0
wcwidth                      0.2.13
websockets                   12.0
Werkzeug                     3.0.4
wheel                        0.44.0
wrapt                        1.16.0
xformers                     0.0.28.post1
yarl                         1.13.1
zipp                         3.20.2

[notice] A new release of pip is available: 23.0.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip

(venv) PS D:\Kohya_GUI_Flux_Installer_21\kohya_ss> nvidia-smi
Tue Oct  1 05:35:09 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.81                 Driver Version: 560.81         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090      WDDM  |   00000000:02:00.0 Off |                  N/A |
| 73%   63C    P2            204W /  370W |   24271MiB /  24576MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3060      WDDM  |   00000000:03:00.0  On |                  N/A |
|  0%   59C    P8             23W /  170W |     973MiB /  12288MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

05:11:49-505054 INFO     headless: False
05:11:49-571035 INFO     Using shell=True when running external commands...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
05:12:18-344000 INFO     Loading config...
05:12:23-768624 INFO     Start training Dreambooth...
05:12:23-770609 INFO     Validating lr scheduler arguments...
05:12:23-771608 INFO     Validating optimizer arguments...
05:12:23-772608 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\log existence and writability...
                         SUCCESS
05:12:23-774626 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\model existence and writability...
                         SUCCESS
05:12:23-776608 INFO     Validating
                         D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005.safetensor
                         s existence... SUCCESS
05:12:23-777627 INFO     Validating
                         D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005-state
                         existence... SUCCESS
05:12:23-779835 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\img existence... SUCCESS
05:12:23-780608 INFO     Folder 1_ohwx woman: 1 repeats found
05:12:23-783608 INFO     Folder 1_ohwx woman: 220 images found
05:12:23-784607 INFO     Folder 1_ohwx woman: 220 * 1 = 220 steps
05:12:23-786609 INFO     Regularization factor: 1
05:12:23-787608 INFO     Total steps: 220
05:12:23-789609 INFO     Train batch size: 1
05:12:23-790608 INFO     Gradient accumulation steps: 1
05:12:23-791608 INFO     Epoch: 15
05:12:23-792608 INFO     max_train_steps (220 / 1 / 1 * 15 * 1) = 3300
05:12:23-794610 INFO     lr_warmup_steps = 0
05:12:23-799630 WARNING  Here is the trainer command as a reference. It will not be executed:

D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 D:/Kohya_GUI_Flux_Installer_21/kohya_ss/sd-scripts/flux_train.py --config_file D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051223.toml

05:12:23-801630 INFO     Showing toml config file:
                         D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051223.toml

05:12:23-803628 INFO     adaptive_noise_scale = 0
                         ae = "D:/ComfyUI_windows_portable/ComfyUI/models/vae/ae.safetensors"
                         blocks_to_swap = 0
                         bucket_no_upscale = true
                         bucket_reso_steps = 64
                         cache_latents = true
                         cache_latents_to_disk = true
                         cache_text_encoder_outputs = true
                         cache_text_encoder_outputs_to_disk = true
                         caption_dropout_every_n_epochs = 0
                         caption_dropout_rate = 0
                         caption_extension = ".txt"
                         clip_l = "D:/ComfyUI_windows_portable/ComfyUI/models/clip/clip_l.safetensors"
                         cpu_offload_checkpointing = true
                         discrete_flow_shift = 3.1582
                         double_blocks_to_swap = 5
                         dynamo_backend = "no"
                         epoch = 15
                         full_bf16 = true
                         fused_backward_pass = true
                         gradient_accumulation_steps = 1
                         gradient_checkpointing = true
                         guidance_scale = 1
                         huber_c = 0.1
                         huber_schedule = "snr"
                         keep_tokens = 0
                         learning_rate = 4e-6
                         learning_rate_te = 0
                         logging_dir = "D:/Kohya_GUI_Flux_Installer_21/train_4\\log"
                         loss_type = "l2"
                         lr_scheduler = "constant"
                         lr_scheduler_args = []
                         lr_scheduler_num_cycles = 1
                         lr_scheduler_power = 1
                         lr_warmup_steps = 0
                         max_bucket_reso = 2048
                         max_data_loader_n_workers = 0
                         max_timestep = 1000
                         max_token_length = 75
                         max_train_steps = 3300
                         mem_eff_save = true
                         min_bucket_reso = 256
                         mixed_precision = "bf16"
                         model_prediction_type = "raw"
                         multires_noise_discount = 0.3
                         multires_noise_iterations = 0
                         noise_offset = 0
                         noise_offset_type = "Original"
                         optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False",
                         "weight_decay=0.01",]
                         optimizer_type = "Adafactor"
                         output_dir = "D:/Kohya_GUI_Flux_Installer_21/train_4\\model"
                         output_name = "alr_p"
                         persistent_data_loader_workers = 0
                         pretrained_model_name_or_path =
                         "D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005.safetenso
                         rs"
                         prior_loss_weight = 1
                         resolution = "1024,1024"
                         resume =
                         "D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005-state"
                         sample_prompts = "D:/Kohya_GUI_Flux_Installer_21/train_4\\model\\sample/prompt.txt"
                         sample_sampler = "euler_a"
                         save_every_n_epochs = 3
                         save_model_as = "safetensors"
                         save_precision = "fp16"
                         save_state = true
                         save_state_on_train_end = true
                         sdpa = true
                         seed = 1
                         single_blocks_to_swap = 0
                         t5xxl_max_token_length = 512
                         timestep_sampling = "sigmoid"
                         train_batch_size = 1
                         train_blocks = "all"
                         train_data_dir = "D:/Kohya_GUI_Flux_Installer_21/train_4\\img"
                         vae_batch_size = 4
                         wandb_run_name = "alr_p"

05:12:23-814643 INFO     end of toml config file:
                         D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051223.toml
05:12:26-356066 INFO     Start training Dreambooth...
05:12:26-358065 INFO     Validating lr scheduler arguments...
05:12:26-360065 INFO     Validating optimizer arguments...
05:12:26-362065 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\log existence and writability...
                         SUCCESS
05:12:26-363064 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\model existence and writability...
                         SUCCESS
05:12:26-366065 INFO     Validating
                         D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate__reality_0-000005.safetensor
                         s existence... SUCCESS
05:12:26-368064 INFO     Validating
                         D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005-state
                         existence... SUCCESS
05:12:26-370064 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\img existence... SUCCESS
05:12:26-372065 INFO     Folder 1_ohwx woman: 1 repeats found
05:12:26-377064 INFO     Folder 1_ohwx woman: 220 images found
05:12:26-379064 INFO     Folder 1_ohwx woman: 220 * 1 = 220 steps
05:12:26-381064 INFO     Regularization factor: 1
05:12:26-382064 INFO     Total steps: 220
05:12:26-384065 INFO     Train batch size: 1
05:12:26-386065 INFO     Gradient accumulation steps: 1
05:12:26-388065 INFO     Epoch: 15
05:12:26-389064 INFO     max_train_steps (220 / 1 / 1 * 15 * 1) = 3300
05:12:26-392065 INFO     lr_warmup_steps = 0
05:12:26-399065 INFO     Saving training config to
                         D:/Kohya_GUI_Flux_Installer_21/train_4\model\alr_p_20241001-051226.json...
05:12:26-404065 INFO     Executing command: D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\Scripts\accelerate.EXE launch
                         --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16 --num_processes 1
                         --num_machines 1 --num_cpu_threads_per_process 2
                         D:/Kohya_GUI_Flux_Installer_21/kohya_ss/sd-scripts/flux_train.py --config_file
                         D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051226.toml
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  torch.utils._pytree._register_pytree_node(
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  torch.utils._pytree._register_pytree_node(
2024-10-01 05:12:51 INFO     Loading settings from D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051226.toml...                                                train_util.py:4328
                    INFO     D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051226                                                                              train_util.py:4347
2024-10-01 05:12:51 INFO     Using DreamBooth method.                                                                                                                                           flux_train.py:103
                    INFO     prepare images.                                                                                                                                                   train_util.py:1872
                    INFO     get image size from name of cache files                                                                                                                           train_util.py:1810
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 399.19it/s]
2024-10-01 05:12:52 INFO     set image size from cache files: 220/220                                                                                                                          train_util.py:1817
                    INFO     found directory D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman contains 220 image files                                                            train_util.py:1819
                    WARNING  No caption file found for 220 images. Training will continue without captions for these images. If class token exists, it will be used. /                         train_util.py:1850
                             220枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1413--flux1-dev-952361136.jpg                                                                       train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1610--xl_stoiq_duchaiten_00001_-298273446.jpg                                                       train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1639--xl_stoiq_duchaiten_00001_-1106092064-1.jpg                                                    train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1639--xl_stoiq_duchaiten_00001_-1106092064.jpg                                                      train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1639--xl_stoiq_duchaiten_00001_-1106092065-1.jpg                                                    train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1641--xl_stoiq_duchaiten_00001_-1106092067-1.jpg... and 215 more                                    train_util.py:1855
                    INFO     220 train images with repeating.                                                                                                                                  train_util.py:1913
                    INFO     0 reg images.                                                                                                                                                     train_util.py:1916
                    WARNING  no regularization images / 正則化画像が見つかりませんでした                                                                                                       train_util.py:1921
                    INFO     [Dataset 0]                                                                                                                                                       config_util.py:570
                               batch_size: 1
                               resolution: (1024, 1024)
                               enable_bucket: False
                               network_multiplier: 1.0

                               [Subset 0 of Dataset 0]
                                 image_dir: "D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman"
                                 image_count: 220
                                 num_repeats: 1
                                 shuffle_caption: False
                                 keep_tokens: 0
                                 keep_tokens_separator:
                                 caption_separator: ,
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: False
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 alpha_mask: False,
                                 is_reg: False
                                 class_tokens: ohwx woman
                                 caption_extension: .txt

                    INFO     [Dataset 0]                                                                                                                                                       config_util.py:576
                    INFO     loading image sizes.                                                                                                                                               train_util.py:909
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 220/220 [00:00<?, ?it/s]
                    INFO     prepare dataset                                                                                                                                                    train_util.py:917
                    INFO     prepare accelerator                                                                                                                                                flux_train.py:173
accelerator device: cuda
                    INFO     Building AutoEncoder                                                                                                                                                flux_utils.py:62
                    INFO     Loading state dict from D:/ComfyUI_windows_portable/ComfyUI/models/vae/ae.safetensors                                                                               flux_utils.py:66
                    INFO     Loaded AE: <All keys matched successfully>                                                                                                                          flux_utils.py:69
2024-10-01 05:12:54 INFO     [Dataset 0]                                                                                                                                                       train_util.py:2396
                    INFO     caching latents with caching strategy.                                                                                                                            train_util.py:1017
                    INFO     checking cache validity...                                                                                                                                        train_util.py:1044
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 569.28it/s]
                    INFO     no latents to cache                                                                                                                                               train_util.py:1087
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-10-01 05:12:56 INFO     Building CLIP                                                                                                                                                       flux_utils.py:74
                    INFO     Loading state dict from D:/ComfyUI_windows_portable/ComfyUI/models/clip/clip_l.safetensors                                                                         flux_utils.py:167
                    INFO     Loaded CLIP: <All keys matched successfully>                                                                                                                       flux_utils.py:170
                    INFO     Loading state dict from None                                                                                                                                       flux_utils.py:215
Traceback (most recent call last):
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\flux_train.py", line 993, in <module>
    train(args)
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\flux_train.py", line 211, in train
    t5xxl = flux_utils.load_t5xxl(args.t5xxl, weight_dtype, "cpu", args.disable_mmap_load_safetensors)
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\library\flux_utils.py", line 216, in load_t5xxl
    sd = load_safetensors(ckpt_path, device=str(device), disable_mmap=disable_mmap, dtype=dtype)
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\library\flux_utils.py", line 39, in load_safetensors
    return load_file(path)  # prevent device invalid Error
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\safetensors\torch.py", line 313, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
TypeError: argument 'filename': expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
  File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
    args.func(args)
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
    simple_launcher(args)
  File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\Kohya_GUI_Flux_Installer_21\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Kohya_GUI_Flux_Installer_21/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'D:/Kohya_GUI_Flux_Installer_21/train_4\\model/config_dreambooth-20241001-051226.toml']' returned non-zero exit status 1.
05:12:59-047872 INFO     Training has ended.
05:14:10-841036 INFO     Save...
05:14:15-063648 INFO     Start training Dreambooth...
05:14:15-065647 INFO     Validating lr scheduler arguments...
05:14:15-066648 INFO     Validating optimizer arguments...
05:14:15-067644 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\log existence and writability... SUCCESS
05:14:15-070646 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\model existence and writability... SUCCESS
05:14:15-072647 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005.safetensors existence... SUCCESS
05:14:15-073647 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005-state existence... SUCCESS
05:14:15-075647 INFO     Validating D:/Kohya_GUI_Flux_Installer_21/train_4\img existence... SUCCESS
05:14:15-077644 INFO     Folder 1_ohwx woman: 1 repeats found
05:14:15-079647 INFO     Folder 1_ohwx woman: 220 images found
05:14:15-081647 INFO     Folder 1_ohwx woman: 220 * 1 = 220 steps
05:14:15-082647 INFO     Regularization factor: 1
05:14:15-083647 INFO     Total steps: 220
05:14:15-084646 INFO     Train batch size: 1
05:14:15-086645 INFO     Gradient accumulation steps: 1
05:14:15-087647 INFO     Epoch: 15
05:14:15-088647 INFO     max_train_steps (220 / 1 / 1 * 15 * 1) = 3300
05:14:15-090647 INFO     lr_warmup_steps = 0
05:14:15-095659 INFO     Saving training config to D:/Kohya_GUI_Flux_Installer_21/train_4\model\alr_p_20241001-051415.json...
05:14:15-098647 INFO     Executing command: D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16
                         --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 D:/Kohya_GUI_Flux_Installer_21/kohya_ss/sd-scripts/flux_train.py --config_file
                         D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051415.toml
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  torch.utils._pytree._register_pytree_node(
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  torch.utils._pytree._register_pytree_node(
2024-10-01 05:14:39 INFO     Loading settings from D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051415.toml...                                                train_util.py:4328
                    INFO     D:/Kohya_GUI_Flux_Installer_21/train_4\model/config_dreambooth-20241001-051415                                                                              train_util.py:4347
2024-10-01 05:14:39 INFO     Using DreamBooth method.                                                                                                                                           flux_train.py:103
                    INFO     prepare images.                                                                                                                                                   train_util.py:1872
                    INFO     get image size from name of cache files                                                                                                                           train_util.py:1810
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 405.83it/s]
                    INFO     set image size from cache files: 220/220                                                                                                                          train_util.py:1817
                    INFO     found directory D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman contains 220 image files                                                            train_util.py:1819
                    WARNING  No caption file found for 220 images. Training will continue without captions for these images. If class token exists, it will be used. /                         train_util.py:1850
                             220枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1413--flux1-dev-952361136.jpg                                                                       train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1610--xl_stoiq_duchaiten_00001_-298273446.jpg                                                       train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1639--xl_stoiq_duchaiten_00001_-1106092064-1.jpg                                                    train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1639--xl_stoiq_duchaiten_00001_-1106092064.jpg                                                      train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1639--xl_stoiq_duchaiten_00001_-1106092065-1.jpg                                                    train_util.py:1857
                    WARNING  D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman\1641--xl_stoiq_duchaiten_00001_-1106092067-1.jpg... and 215 more                                    train_util.py:1855
                    INFO     220 train images with repeating.                                                                                                                                  train_util.py:1913
                    INFO     0 reg images.                                                                                                                                                     train_util.py:1916
                    WARNING  no regularization images / 正則化画像が見つかりませんでした                                                                                                       train_util.py:1921
                    INFO     [Dataset 0]                                                                                                                                                       config_util.py:570
                               batch_size: 1
                               resolution: (1024, 1024)
                               enable_bucket: False
                               network_multiplier: 1.0

                               [Subset 0 of Dataset 0]
                                 image_dir: "D:\Kohya_GUI_Flux_Installer_21\train_4\img\1_ohwx woman"
                                 image_count: 220
                                 num_repeats: 1
                                 shuffle_caption: False
                                 keep_tokens: 0
                                 keep_tokens_separator:
                                 caption_separator: ,
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: False
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 alpha_mask: False,
                                 is_reg: False
                                 class_tokens: ohwx woman
                                 caption_extension: .txt

                    INFO     [Dataset 0]                                                                                                                                                       config_util.py:576
                    INFO     loading image sizes.                                                                                                                                               train_util.py:909
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 220383.78it/s]
                    INFO     prepare dataset                                                                                                                                                    train_util.py:917
                    INFO     prepare accelerator                                                                                                                                                flux_train.py:173
accelerator device: cuda
                    INFO     Building AutoEncoder                                                                                                                                                flux_utils.py:62
2024-10-01 05:14:40 INFO     Loading state dict from D:/ComfyUI_windows_portable/ComfyUI/models/vae/ae.safetensors                                                                               flux_utils.py:66
                    INFO     Loaded AE: <All keys matched successfully>                                                                                                                          flux_utils.py:69
                    INFO     [Dataset 0]                                                                                                                                                       train_util.py:2396
                    INFO     caching latents with caching strategy.                                                                                                                            train_util.py:1017
                    INFO     checking cache validity...                                                                                                                                        train_util.py:1044
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 2050.71it/s]
                    INFO     no latents to cache                                                                                                                                               train_util.py:1087
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-10-01 05:14:41 INFO     Building CLIP                                                                                                                                                       flux_utils.py:74
                    INFO     Loading state dict from D:/ComfyUI_windows_portable/ComfyUI/models/clip/clip_l.safetensors                                                                         flux_utils.py:167
                    INFO     Loaded CLIP: <All keys matched successfully>                                                                                                                       flux_utils.py:170
                    INFO     Loading state dict from D:/ComfyUI_windows_portable/ComfyUI/models/clip/t5xxl_fp16.safetensors                                                                     flux_utils.py:215
2024-10-01 05:14:42 INFO     Loaded T5xxl: <All keys matched successfully>                                                                                                                      flux_utils.py:218
2024-10-01 05:15:10 INFO     [Dataset 0]                                                                                                                                                       train_util.py:2417
                    INFO     caching Text Encoder outputs with caching strategy.                                                                                                               train_util.py:1179
                    INFO     checking cache validity...                                                                                                                                        train_util.py:1185
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 563.95it/s]
2024-10-01 05:15:11 INFO     no Text Encoder outputs to cache                                                                                                                                  train_util.py:1207
                    INFO     cache Text Encoder outputs for sample prompt: D:/Kohya_GUI_Flux_Installer_21/train_4\model\sample/prompt.txt                                                 flux_train.py:237
                    INFO     Building Flux model dev                                                                                                                                             flux_utils.py:45
                    INFO     Loading state dict from D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005.safetensors                                            flux_utils.py:52
                    INFO     Loaded Flux: <All keys matched successfully>                                                                                                                        flux_utils.py:55
FLUX: Gradient checkpointing enabled. CPU offload: True
number of trainable parameters: 11901408320
prepare optimizer, data loader etc.
                    INFO     use Adafactor optimizer | {'scale_parameter': False, 'relative_step': False, 'warmup_init': False, 'weight_decay': 0.01}                                          train_util.py:4641
                    WARNING  because max_grad_norm is set, clip_grad_norm is enabled. consider set to 0 /                                                                                      train_util.py:4669
                             max_grad_normが設定されているためclip_grad_normが有効になります。0に設定して無効にしたほうがいいかもしれません
                    WARNING  constant_with_warmup will be good / スケジューラはconstant_with_warmupが良いかもしれません                                                                        train_util.py:4673
enable full bf16 training.
2024-10-01 05:17:43 INFO     resume training from local state: D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005-state                                      train_util.py:4362
                    INFO     Loading states from D:/Kohya_GUI_Flux_Installer_21/train_3/model/Alternate_reality_0-000005-state                                                   accelerator.py:3085
2024-10-01 05:18:55 INFO     All model weights loaded successfully                                                                                                                           checkpointing.py:214
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py:220: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  optimizer_state = torch.load(input_optimizer_file, map_location=map_location)
                    INFO     All optimizer states loaded successfully                                                                                                                        checkpointing.py:222
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py:228: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  scheduler.load_state_dict(torch.load(input_scheduler_file))
                    INFO     All scheduler states loaded successfully                                                                                                                        checkpointing.py:229
                    INFO     All dataloader sampler states loaded successfully                                                                                                               checkpointing.py:241
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py:251: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  states = torch.load(input_dir.joinpath(f"{RNG_STATE_NAME}_{process_index}.pkl"))
                    INFO     All random states loaded successfully                                                                                                                           checkpointing.py:262
                    INFO     Loading in 0 custom states                                                                                                                                       accelerator.py:3170
running training / 学習開始
  num examples / サンプル数: 220
  num batches per epoch / 1epochのバッチ数: 220
  num epochs / epoch数: 15
  batch size per device / バッチサイズ: 1
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 3300
steps:   0%|                                                                                                                                                                           | 0/3300 [00:00<?, ?it/s]
epoch 1/15
2024-10-01 05:18:56 INFO     epoch is incremented. current_epoch: 0, epoch: 1                                                                                                                   train_util.py:701
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\library\flux_models.py:449: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  x = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask)
D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
  with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context:  # type: ignore[attr-defined]
steps:   0%|▍                                                                                                                                             | 10/3300 [21:25<117:29:44, 128.57s/it, avr_loss=0.47]

bmaltais / kohya_ss

After upgrading and rerunning the setup, Flux finetuning from 14s/it into 164s/it #2874