Open pamelafox opened 6 months ago
Note that this may affect deployability if you're relying on App Service's auto detection for app startup script, as it doesn't yet detect any async frameworks.
Thank you so much, @pamelafox we'll triage this change and prioritize for deployment accordingly.
Question: The docker files use uWSGI
, wouldn't that already support parallelism? At least in terms of parallel users.
@ashikns Yes, you can have multiple workers/threads when running a WSGI app using gunicorn on a multi-core machine. However, if all those workers are tied up waiting for the results of an API call, they can't respond to new user requests. With async calls and frameworks, a worker can service a new request while waiting for the results. You can serve more users with less cores.
Motivation
See my blog post here: http://blog.pamelafox.org/2023/09/best-practices-for-openai-chat-apps.html
We ported the other sample to Quart, as it was a more 1:1 mapping from Flask, but the more popular async framework is FastAPI.
Such a change will enable users to handle more requests with less resources / lower SKUs.
How would you feel if this feature request was implemented?
Requirements
Tasks
To be filled in by the engineer picking up the issue