When the batch_size cannot be exactly divided by num_step_per_communication in the BlueFog optimizers, each epoch will leave some data to the next epoch for communication, which may result in a model not fully using the entire dataset in the last epoch. In addition, the mixed data usage between adjacent epochs is also not desired.
When the batch_size cannot be exactly divided by num_step_per_communication in the BlueFog optimizers, each epoch will leave some data to the next epoch for communication, which may result in a model not fully using the entire dataset in the last epoch. In addition, the mixed data usage between adjacent epochs is also not desired.