Azure-Samples / chat-with-your-data-solution-accelerator

A Solution Accelerator for the RAG pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. This includes most common requirements and best practices.
https://azure.microsoft.com/products/search
MIT License
495 stars 250 forks source link

Use async framework, not Flask #40

Open pamelafox opened 6 months ago

pamelafox commented 6 months ago

Motivation

See my blog post here: http://blog.pamelafox.org/2023/09/best-practices-for-openai-chat-apps.html

We ported the other sample to Quart, as it was a more 1:1 mapping from Flask, but the more popular async framework is FastAPI.

Such a change will enable users to handle more requests with less resources / lower SKUs.

How would you feel if this feature request was implemented?

efficient

Requirements

Tasks

To be filled in by the engineer picking up the issue

pamelafox commented 6 months ago

Note that this may affect deployability if you're relying on App Service's auto detection for app startup script, as it doesn't yet detect any async frameworks.

gmndrg commented 6 months ago

Thank you so much, @pamelafox we'll triage this change and prioritize for deployment accordingly.

ashikns commented 5 months ago

Question: The docker files use uWSGI, wouldn't that already support parallelism? At least in terms of parallel users.

pamelafox commented 5 months ago

@ashikns Yes, you can have multiple workers/threads when running a WSGI app using gunicorn on a multi-core machine. However, if all those workers are tied up waiting for the results of an API call, they can't respond to new user requests. With async calls and frameworks, a worker can service a new request while waiting for the results. You can serve more users with less cores.