elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.33k stars 95 forks source link

Return only new text from text generation #302

Closed jonatanklosko closed 9 months ago

jonatanklosko commented 9 months ago

Closes #247.

This changes text generation serving to only return the new text (without the prompt). This is consistent with streaming. Also, encoder-decoder models like BART already don't return the input text, since it is used as a "context" rather than a "prompt" to complete.

This is a small breaking change, but the next release is going to be 0.5.0 and I think it's fine.

Initially I thought about adding an option like :return_full_text, but to make it handle leading space in a generic way, we would need to make another tokenizer pass on the input, then prefix replace (that's what hf/transformers do). I don't think this is necessarily worth it, because the end user knows what model they work with, so they can easily concatenate the prompt, either adding a space or not. We can revisit the option if there is an actual use case, but it's usually the new text that users care about.