Questions about using this project in a Flask app

I am thrilled to have discovered this project - it has been tremendously helpful in dealing with the high volume of access requirements. However, it seems that the serving mode in the openai-manager project does not currently implement the ChatCompletion feature? This is my primary expected method of invocation.

In addition, I have built a simple Flask reverse proxy app for user access control and user-level usage control of API calls. Therefore, I'd like to modify my code to directly use the openai-manager within this Flask app to realize load-balancing for multiple APIs.

Being not very familiar with asynchronous programming in Python, I have a couple of questions I'd like to ask:

In serving.py, does the GLOBAL_MANAGER only control the list of tasks submitted in a single submission, or all requests submitted over multiple submissions? In other words, can the current serving implementation properly handle multiple concurrent requests from a single source?
Is it feasible to use asyncio.run() to call the submission function directly within a Flask app enabled for multi-threading?

Thank you in advance for your time and help.

MrZilinXiao / openai-manager

Questions about using this project in a Flask app #3