Hi! Thank you for publishing your model weights and code. I'm wondering whether it's possible to get inference token-by-token. Paper has notion of AutoRegressive model, and however I see that it's autoregressive towards outputs and not text inputs, I still hope that I'm missing something. The usecase is audio streaming for LLMs such as ChatGPT.
So, is there a way to make inference of the model having only first few words from the sentence and feeding the rest of text input as it comes from LLM? And if yes, can you show me how?
Hi! Thank you for publishing your model weights and code. I'm wondering whether it's possible to get inference token-by-token. Paper has notion of AutoRegressive model, and however I see that it's autoregressive towards outputs and not text inputs, I still hope that I'm missing something. The usecase is audio streaming for LLMs such as ChatGPT.
So, is there a way to make inference of the model having only first few words from the sentence and feeding the rest of text input as it comes from LLM? And if yes, can you show me how?