I am thrilled to have discovered this project - it has been tremendously helpful in dealing with the high volume of access requirements. However, it seems that the serving mode in the openai-manager project does not currently implement the ChatCompletion feature? This is my primary expected method of invocation.
In addition, I have built a simple Flask reverse proxy app for user access control and user-level usage control of API calls. Therefore, I'd like to modify my code to directly use the openai-manager within this Flask app to realize load-balancing for multiple APIs.
Being not very familiar with asynchronous programming in Python, I have a couple of questions I'd like to ask:
In serving.py, does the GLOBAL_MANAGER only control the list of tasks submitted in a single submission, or all requests submitted over multiple submissions? In other words, can the current serving implementation properly handle multiple concurrent requests from a single source?
Is it feasible to use asyncio.run() to call the submission function directly within a Flask app enabled for multi-threading?
Thanks for your interest! And yes, ChatCompletion is now only available for python package usage.
For your question:
I would recommend you use a message queue backed with Redis for your specific usage, as this project only considers requests from ONE source.
I am not sure the current Flask design allows calling external async functions. But yes, the most simple (but not elegant) way to work around it is to start a separate process for openai-manager.
I am thrilled to have discovered this project - it has been tremendously helpful in dealing with the high volume of access requirements. However, it seems that the serving mode in the openai-manager project does not currently implement the
ChatCompletion
feature? This is my primary expected method of invocation.In addition, I have built a simple Flask reverse proxy app for user access control and user-level usage control of API calls. Therefore, I'd like to modify my code to directly use the openai-manager within this Flask app to realize load-balancing for multiple APIs.
Being not very familiar with asynchronous programming in Python, I have a couple of questions I'd like to ask:
In
serving.py
, does theGLOBAL_MANAGER
only control the list of tasks submitted in a single submission, or all requests submitted over multiple submissions? In other words, can the current serving implementation properly handle multiple concurrent requests from a single source?Is it feasible to use
asyncio.run()
to call the submission function directly within a Flask app enabled for multi-threading?Thank you in advance for your time and help.