Closed jonatanklosko closed 4 months ago
When streaming tokens, we don't care about the generation final output. This PR changes the result to a zeroed tensor with shape {batch_size} (so it can be split by the serving).
{batch_size}
When streaming tokens, we don't care about the generation final output. This PR changes the result to a zeroed tensor with shape
{batch_size}
(so it can be split by the serving).