Describe the regression
The new code is much slower than the old one when increasing the sequence length.
To Reproduce
The code was tested on a A800-80GB server equipped with NVLINK.
The configuration is as following, and only the --seq-lengthand --max-position-embeddings are changed.
Describe the regression The new code is much slower than the old one when increasing the sequence length.
To Reproduce The code was tested on a A800-80GB server equipped with NVLINK. The configuration is as following, and only the
--seq-length
and--max-position-embeddings
are changed.Environment (please complete the following information):
Proposed fix Change the
get_batch
function to the original implementation.