Open iqddd opened 4 hours ago
As a workaround, you can set 'Max train epoch' = 'Epoch' and set 'Max train steps' extremely large (for example 999999). sd-scripts console output:
override steps. steps for 25 epochs is / 指定エポックまでのステップ数: 525
enable full fp16 training.
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 63
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 21
num epochs / epoch数: 25
batch size per device / バッチサイズ: 4
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 525
When the number of epochs is set, but 'Max train epoch' and 'Max train steps' are both set to 0 (meaning no override), 'Max train steps' is then automatically calculated using the following formula: Number of images * Number of epochs / Batch size. But this formula doesn't take into account that when bucketing, some buckets will have a smaller batch_size (if the number of images in the bucket is not a multiple of batch_size). But sd-scripts take bucketing into account and set the correct step count per epoch. However, since the GUI sets 'Max train steps', the actual number of epochs is fewer than specified in the GUI. For example: Analysis from GUI before calling sd-scripts and setting "max_train_steps".
Information about buckets from sd-scripts:
Summing up, we get 21 steps per epoch. Which is confirmed by further output in the console:
num epochs / epoch: 19 instead of 25.