Training with gradient_accumulation_steps=16, max_train_steps not stopping the training at defined max_train_steps

Describe the bug

I'm training locally with 8GB vram card. To make training faster, I have changed in the settings, gradient_accumulation_steps from 1 to 16, also I run "accelerate config" and changed the value there. But now when I run the script with the settings:

 accelerate launch --mixed_precision="fp16" train_dreambooth.py \
  --pretrained_model_name_or_path="$MODEL_NAME"  \
  --instance_data_dir="$INSTANCE_DIR" \
  --output_dir="$OUTPUT_DIR" \
  --instance_prompt="audra miller" \
  --resolution=512 \
  --train_batch_size=1 \
  --sample_batch_size=1 \
  --gradient_accumulation_steps=16 --gradient_checkpointing \
  --learning_rate=4e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=2800 \
  --save_interval 300 \
  --save_min_steps 1000

The script doesn't stop at 2800 steps, it continue to run over 3600 steps, and I had to terminate it manually. Checkpoint generated fine.

Reproduction

Set gradient_accumulation_steps=16 Steps number = training images multiplied by 100. Not stopping at the defined number of steps.

Logs

Steps: : 3663it [44:35,  1.85it/s, loss=0.145, lr=4e-6][2023-02-07 12:41:03,231] [INFO] [timer.py:197:stop] 0/3664, RunningAvgSamplesPerSec=1.3890020784507544, CurrSamplesPerSec=0.2603498963207169, MemAllocated=1.67GB, MaxMemAllocated=4.91GB

System Info

diffusers version: 0.12.1
Platform: Linux-5.15.0-58-generic-x86_64-with-glibc2.35
Python version: 3.10.6
PyTorch version (GPU?): 1.13.1+cu117 (True)
Huggingface_hub version: 0.11.1
Transformers version: 0.16.0
Accelerate version: not installed
xFormers version: 0.0.16
Using GPU in script?:
Using distributed or parallel set-up in script?:

ShivamShrirao / diffusers