Closed yuvalkirstain closed 2 years ago
I am willing to work on this but I'd like to know how urgent it is :)
@SeanNaren any comment about this? :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🚀 Feature
Provide an option to batch examples by the maximum amount of tokens.
Motivation
Motivation In NLP, the memory consumption of each example is determined by the number of tokens that it possesses. Some examples might be very long (and require a lot of memory), while others might be short (and require little memory). Thus, the batch size should be determined by the number of tokens rather than the number of examples (even if the implementation is somewhat trickier). Imagine an extreme case that there is a single example with 2k tokens, while the rest have 10 tokens. If the max memory consumption is 2k tokens, we will use a batch size of 1. Thus, each epoch will take x200 longer!
Pitch
Let us train language models more efficiently!
Additional context
Fairseq implements this feature but I like Pytorch-Lightning :)