Closed SimLif closed 5 months ago
They are equivalent by definition:
training throughput = (# examples processed per second)
The right hand side is equal to batch size / time per step
. Rearranging this equation gives:
time per step = (batch size) / (training throughput)
I think the confusion is that "equivalent" does not mean "equal" in this context. Rather, given a particular batch size
, knowing the throughput is equivalent information to knowing the step time, in the sense that knowing one allows you to compute the other.
Hey @SimLif, you're right that your language might be more precise here. In general looking at training throughput is one of the more useful metrics when choosing a batch size.
One of the sections mentions that
training throughput
is equivalent totime per step
. There is a doubt here. Suppose there are two kinds ofbatch size
:64
and128
, thentraining throughput
does not have the same value whentime per step
is both 1. And obviously,training throughput
is a better reflection ofbatch size
.