This introduces a max length per batch at inference.
By default the max_length was set at 250 tokens.
It is now Min(max_length, batch_max_len * max_length_ratio) + 5
Let say the max_length_ratio is set at 1.25 (large enough for most European languages)
If a batch has examples from length 5 to 10
the max_length will be = 10 x 1.25 + 5 = 17
The "5" offset prevents from stopping to early for very short sequences and the max_length_ratio enable to stop before the max_length (which might be too high) and prevent from hallucinations / repetitions.
If you need to disable the feature, set it to zero.
This introduces a max length per batch at inference. By default the max_length was set at 250 tokens.
It is now Min(max_length, batch_max_len * max_length_ratio) + 5 Let say the max_length_ratio is set at 1.25 (large enough for most European languages)
If a batch has examples from length 5 to 10 the max_length will be = 10 x 1.25 + 5 = 17
The "5" offset prevents from stopping to early for very short sequences and the max_length_ratio enable to stop before the max_length (which might be too high) and prevent from hallucinations / repetitions.
If you need to disable the feature, set it to zero.