Incorrect calculation of “Max train steps”.

When the number of epochs is set, but 'Max train epoch' and 'Max train steps' are both set to 0 (meaning no override), 'Max train steps' is then automatically calculated using the following formula: Number of images * Number of epochs / Batch size. But this formula doesn't take into account that when bucketing, some buckets will have a smaller batch_size (if the number of images in the bucket is not a multiple of batch_size). But sd-scripts take bucketing into account and set the correct step count per epoch. However, since the GUI sets 'Max train steps', the actual number of epochs is fewer than specified in the GUI. For example: Analysis from GUI before calling sd-scripts and setting "max_train_steps".

Folder 1_diamel_xl: 1 repeats found
Folder 1_diamel_xl: 63 images found
Folder 1_diamel_xl: 63 * 1 = 63 steps
Regularization factor: 1
Train batch size: 4
Gradient accumulation steps: 1
Epoch: 25
max_train_steps (63 / 4 / 1 * 25 * 1) = 394

Information about buckets from sd-scripts:

bucket 0: resolution (704, 1344), count: 2          //1 step
bucket 1: resolution (768, 1280), count: 3      //1 step
bucket 2: resolution (832, 1216), count: 22     //6 step
bucket 3: resolution (896, 1152), count: 17     //5 step
bucket 4: resolution (960, 1088), count: 5      //2 step
bucket 5: resolution (1024, 1024), count: 9     //3 step
bucket 6: resolution (1088, 960), count: 2      //1 step
bucket 7: resolution (1152, 896), count: 2      //1 step
bucket 8: resolution (1216, 832), count: 1      //1 step

Summing up, we get 21 steps per epoch. Which is confirmed by further output in the console:

  num train images * repeats / 学習画像の数×繰り返し回数: 63
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 21
  num epochs / epoch数: 19
  batch size per device / バッチサイズ: 4
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 394

num epochs / epoch: 19 instead of 25.

bmaltais / kohya_ss

Incorrect calculation of “Max train steps”. #2965