aharley / simple_bev

A Simple Baseline for BEV Perception
MIT License
457 stars 70 forks source link

Fig. 4 Effect of batch size #35

Closed hopef closed 12 months ago

hopef commented 1 year ago

Thanks for your excellent work. I have doubts about figure4 in the paper. I saw the batch-size has such a significant effect. This is a vast difference from our common perception. If we use the same 25000 iterations, the different batch-size represent a vast difference in the number of iterations over the data. So is this comparison unreasonable? I would like to understand how this experiment was performed.

For example, if bevformer iterates 25,000 times at batch1, it will have trained only 1 epoch on the nuScenes dataset. while simpleBEV iterates 25,000 times at batch40, meaning it has trained 40 epochs on the nuScenes dataset. right?

aharley commented 1 year ago

Great question. For the lower batch sizes we trained for 2x and 4x and 8x the base number of iterations (25k) and picked the best result. It indeed seems that it's the batch size, not just the number of iterations, which yields the effect.

hopef commented 12 months ago

For nuScenes dataset, it has 28000 training samples. At batchsize=1 for 8x25000 iterations, which is equal to 7 epochs. Is it possible that it is not enough for 7 epochs? Do we need more epochs(e.g., 70 epochs) to converge stably? Is the comparison meaningful only after stabilization?

aharley commented 12 months ago

That sounds like an interesting question. I think you should try it! If you could refine our claim about batch size into a claim about data exposure (i.e., "number of epochs" being the critical factor) I think that would be a very useful contribution.

hopef commented 12 months ago

Thanks for your reply! I will try it.