Speed of output - Githubissues

mohsiniqbal368 commented 9 months ago

Is it possible to make it output text like ChatGPT as soon as it inferences the request and constructing the output text. Its keep on waiting for extended time and then prints the entire output. The ChatGPT style of start outputting the answer text and as user is reading the text keep adding the words to the answer. Not sure the model works as producing each individual output tokens or generates the entire output in one single processing activity and returns the whole text.

pbv0 commented 9 months ago

Hi, it is possible to implement it but you would need to rewrite part of the application.

Amazon Bedrock can stream the results back to the Lambda function instead of waiting for the full response: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html

You could use WebSockets to stream the result back to the user, for example using AWS AppSync: https://serverlessland.com/patterns/appsync-bedrock-subscriptions-cdk

This blog post covers it in more detail: https://aws.amazon.com/blogs/mobile/connecting-applications-to-generative-ai-presents-new-challenges/

pbv0 commented 9 months ago

Closing this for now, please reopen if anything is unclear.

aws-samples / serverless-pdf-chat

Speed of output #25