google-research / tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.
Other
26.29k stars 2.18k forks source link

`training throughput` may not equal to `time per step` #29

Closed SimLif closed 5 months ago

SimLif commented 1 year ago

One of the sections mentions that training throughput is equivalent to time per step. There is a doubt here. Suppose there are two kinds of batch size: 64 and 128, then training throughput does not have the same value when time per step is both 1. And obviously, training throughput is a better reflection of batch size.

jondeuce commented 1 year ago

They are equivalent by definition:

training throughput = (# examples processed per second)

The right hand side is equal to batch size / time per step. Rearranging this equation gives:

time per step = (batch size) / (training throughput)

I think the confusion is that "equivalent" does not mean "equal" in this context. Rather, given a particular batch size, knowing the throughput is equivalent information to knowing the step time, in the sense that knowing one allows you to compute the other.

varungodbole commented 5 months ago

Hey @SimLif, you're right that your language might be more precise here. In general looking at training throughput is one of the more useful metrics when choosing a batch size.