No matter what value I set for the --save_steps parameter, the system always saves the checkpoint after exactly 500 steps.
No matter what value I set for the --save_total_limit parameter, the system always saves all checkpoints every 500 steps. Kaggle's output directory has a storage limit, so I want to delete older checkpoints when saving new ones.
Notebook Cell for Training bge-m3 on Kaggle Notebook with 2 T4 GPUs:
My Issue:
Notebook Cell for Training
bge-m3
on Kaggle Notebook with 2 T4 GPUs: