Closed vince62s closed 1 year ago
Whether or not to include padding tokens when computing throughput is an interesting question. We choose to include padding so that tokens per second (tok/sec
) reflects the total number of indices processed by the GPU/CPU. For throughput metrics that aren't affected by padding, we can look at sentences per second (s/sec
) and updates per second (u/sec
).
hmmm not even. You can have in a batch 25 sentences of very long sentences or 1000 sentences of very short ones, so not very useful either. u/sec is more meaningful IF and only IF the batching method is similar. But you can say "I use 5000 token batches" but in reality have an average of 3000 or 3500 real tokens which is not the same depending how you bucket the batches. Both in speed and in actual data processed it will differ. In the end you may have to process more updates to go through the same amount of data. (just my 50 cents on trying to compare apples to apples).
Thanks again for your feedback.
Hi,
Should the number of tokens in a batch be either source or target without the padding ? https://github.com/awslabs/sockeye/blob/main/sockeye/data_io.py#L1948 If this is the number to calculate the throughput, might be really off.