Is training steps relevant with batch size?

fortunechen commented 2 years ago

Hi,

Thx for your open-source code !

I find that training steps per epoch is not relevant with batch size. In line 343

steps_per_epoch = num_train_examples // (jax.local_device_count() * config.d_step_per_g_step)

maybe it shoud be

steps_per_epoch = num_train_examples // (jax.local_device_count() config.d_step_per_g_step config.batch_size)

kohjingyu commented 2 years ago

Thanks, you are correct. However, we don't use steps_per_epoch in the training loop (we only use num_train_steps rather than counting epochs), so this does not affect training.

jayati-naik commented 2 years ago

Hi,

But num_train_steps are calculated based on steps_per_epoch.

https://github.com/google-research/xmcgan_image_generation/blob/22a7ef2914787904949fe1fc3f5e560f1e75db29/xmcgan/train_utils.py#L345

In my case with 1000 training examples and 200 validation example, if I go for batch_size = 8 , d_step_per_g_step = 2 num_train_steps = 2500 and it fails around step 1600 with OutOfRangeError at below step. https://github.com/google-research/xmcgan_image_generation/blob/22a7ef2914787904949fe1fc3f5e560f1e75db29/xmcgan/train_utils.py#L421

Am I missing something here?

google-research / xmcgan_image_generation

Is training steps relevant with batch size? #4