Bluefog-Lib / bluefog

Distributed and decentralized training framework for PyTorch over graph
https://bluefog-lib.github.io/bluefog/
Apache License 2.0
291 stars 71 forks source link

Left-over data for BlueFog optimizers using num_step_per_communication #65

Open hanbinhu opened 3 years ago

hanbinhu commented 3 years ago

When the batch_size cannot be exactly divided by num_step_per_communication in the BlueFog optimizers, each epoch will leave some data to the next epoch for communication, which may result in a model not fully using the entire dataset in the last epoch. In addition, the mixed data usage between adjacent epochs is also not desired.