Closed vince62s closed 3 months ago
could it be the same as: https://github.com/Unbabel/COMET/issues/158
Training is typically influenced by various factors, but for inference, batch sorting is employed to minimize padding. Consequently, the longest batches end up being processed in the end resulting in a higher number of tokens per batch compared to the beginning.
you can check the difference by setting length_batching to False
When scoring a large file (say > 100K records) why does it start with a high throughput , for instance say 50 it/sec, and quickly after a few 10K records it drops significantly (more than half)
Thanks