elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.26k stars 90 forks source link

Reduce the output of generation loop when streaming #337

Closed jonatanklosko closed 4 months ago

jonatanklosko commented 4 months ago

When streaming tokens, we don't care about the generation final output. This PR changes the result to a zeroed tensor with shape {batch_size} (so it can be split by the serving).