Closed qdrddr closed 5 months ago
@qdrddr thanks a lot for adding AI_SERVICE_WORKERS
! For the latest setup, I suppose this is the recommended setup for wren-ai-service if you would like multiple workers:
if AI_SERVICE_WORKERS
= 1,
1 pod = 1 uvicorn worker
if AI_SERVICE_WORKERS
= N > 1, then the setup would become like this
N pods, each pod still only has 1 uvicorn worker
also, if you would like multiple AI_SERVICE_WORKERS
, then actually we need a Redis pod in order to save "shared variables" among multiple pods of wren-ai-service acting as a session storage
you can see the corresponding docker-compose.yaml here https://github.com/Canner/WrenAI/blob/main/wren-ai-service/docker/docker-compose.yml
Another proposal of supporting multiple ai service workers is to have a proper load balancing implemented on k8s, so that the requests from the same user all go to the same pod. However, I suppose this is more complex to implement.
What's your thought here?
reference:
@qdrddr thanks a lot for adding
AI_SERVICE_WORKERS
! For the latest setup, I suppose this is the recommended setup for wren-ai-service if you would like multiple workers:if
AI_SERVICE_WORKERS
= 1,1 pod = 1 uvicorn worker
if
AI_SERVICE_WORKERS
= N > 1, then the setup would become like thisN pods, each pod still only has 1 uvicorn worker
also, if you would like multiple
AI_SERVICE_WORKERS
, then actually we need a Redis pod in order to save "shared variables" among multiple pods of wren-ai-service acting as a session storageyou can see the corresponding docker-compose.yaml here https://github.com/Canner/WrenAI/blob/main/wren-ai-service/docker/docker-compose.yml
Another proposal of supporting multiple ai service workers is to have a proper load balancing implemented on k8s, so that the requests from the same user all go to the same pod. However, I suppose this is more complex to implement.
What's your thought here?
reference:
Is there an easier way for the SERVICE_WORKERS to share variables by directly communicating with each other? Redis will make this a bit complicated, and it sounds like you shooting fly from a cannon. Because if you want this production ready, then you need your Redis to function in a cluster. And Redis cluster is complicated.
And what variables these workers share, why do they need to there anything in the first place? I'm not saying "you should not do that" I'm saying think twice or even three times if you really want to go with that. @cyyeh
@qdrddr
For example, in asking task, there are states that ai service needs to track of.
Basically ai tasks are long running tasks, and there are intermediate states we define so that users can easily understand what's the system doing now
If we must share variables among ai service workers, then what is your recommendation?
@qdrddr I've got an idea maybe we do not need redis, that is we implement server-sent-events! So instead of clients polling to get the current task status, so we need to share state among servers. Now servers itself automatically send the latest task status to client. The we don't need redis anymore!
However, I still need to think through the pros and cons of each approach though
Reference: https://sysid.github.io/server-sent-events/
@qdrddr I've got an idea maybe we do not need redis, that is we implement server-sent-events! So instead of clients polling to get the current task status, so we need to share state among servers. Now servers itself automatically send the latest task status to client. The we don't need redis anymore!
However, I still need to think through the pros and cons of each approach though
Reference: https://sysid.github.io/server-sent-events/
I wonder why do you need workers to be synchronized in the first place? typically workers in k8s sense are stateless and do not need any synchronization. Thats preferable way of workers to function. They being given a task and that task is executed without any synchronization.
@cyyeh any way regardless of design you proceed with, we need this PR to be merged, because your container requires AI_SERVICE_WORKERS
environment variable to be present and would not start without it.
Also in this PR i fixed another issue I spotted and corrected for the default value of the WREN_ENGINE_ENDPOINT
variable. that is also needed to be fixed for WrenAI to function in k8s. Would you be so kind and merge this?
Since the worker architecture is out of the scope of this PR
@qdrddr
For example, in asking task, there are states that ai service needs to track of.
Basically ai tasks are long running tasks, and there are intermediate states we define so that users can easily understand what's the system doing now
If we must share variables among ai service workers, then what is your recommendation?
If you still need to share variables between workers in any way, then in k8s I would suggest to consider using ConfigMap as one way of doing it. it lets you relay on the k8s distributed etcd cluster replica for data protection. Also we can attach a single ConfigMap to multiple pods. This ConfigMap can be visible in each of the Worker pods as a folder with files which represent environment variable names, and the content of such file will represent the value of the variable. You can read and write/modify the variables, so if for example worker-A modify value of a variable1 then if the worker-B will read that same variable in its ConfigMap folder, it'll have the new value. Yes, the worker will need to re-read the variable, therefore you'll need some other thing to trigger Worker-B to make it re-read. For Development with docker-compose you don't have such a mechanism, instead you can simply map a folder to all the Workers simulating ConfigMap mapping. But most of the time you don't even need multiple workers when running docker-compose locally.
This might be applicable when not many variables stored like that. This is not suitable for large data sharing, you cannot share megabytes of data stored in ConfigMap. ConfigMap will be suitable if you have tens or hundreds of variables, but no more that that. @cyyeh
Alternatively you may consider using the "cloud-native" way of creating k8s Custom Resources.
@qdrddr sure, William will review your PR.
I am just raising this issue first to discuss with u in case we need multiple wren-ai-service pods in the future.
As I said, the tasks in wren-ai-service is a long-running process, and currently the client(browser) uses polling to get the current status of the task. However, if we are using multiple wren-ai-service pods, then we need to make sure the client can get the latest and correct status no matter what's the implementation details.
So that's why I am discussing with you several candidates to solve the issue
@qdrddr sure, William will review your PR.
I am just raising this issue first to discuss with u in case we need multiple wren-ai-service pods.
As I said, the tasks in wren-ai-service is a long-running process, and currently the client(browser) uses polling to get the current status of the task. However, if we are using multiple wren-ai-service pods, then we need to make sure the client can get the latest and correct status no matter what's the implementation details.
So that's why I am discussing with you several candidates to solve the issue
- using redis as session storage
- using sticky session from load balanceer
- using server-sent-events
Perhaps you should split a task into subtasks across workers and save them, lets say in PostersDB before placing to the workers. Once you have split the tasks you can simply assign the tasks to workers, so there will be no need in any worker-to-worker sync in the first place? If the worker dies, you simply re-assign the task to a new worker. @cyyeh
If that's not feasible, then consider ConfigMaps as one of the alternatives.
If the amount of data/variables shared low, I would not use Redis.
@qdrddr sure, William will review your PR. I am just raising this issue first to discuss with u in case we need multiple wren-ai-service pods. As I said, the tasks in wren-ai-service is a long-running process, and currently the client(browser) uses polling to get the current status of the task. However, if we are using multiple wren-ai-service pods, then we need to make sure the client can get the latest and correct status no matter what's the implementation details. So that's why I am discussing with you several candidates to solve the issue
- using redis as session storage
- using sticky session from load balanceer
- using server-sent-events
Perhaps you should split a task into subtasks across workers and save them, lets say in PostersDB before placing to the workers. Once you have split the tasks you can simply assign the tasks to workers, so there will be no need in any worker-to-worker sync in the first place? If the worker dies, you simply re-assign the task to a new worker. @cyyeh
Thanks for your suggestion. We'll think about it in terms of implementation complexity, architecture complexity, etc.
@qdrddr sure, William will review your PR. I am just raising this issue first to discuss with u in case we need multiple wren-ai-service pods. As I said, the tasks in wren-ai-service is a long-running process, and currently the client(browser) uses polling to get the current status of the task. However, if we are using multiple wren-ai-service pods, then we need to make sure the client can get the latest and correct status no matter what's the implementation details. So that's why I am discussing with you several candidates to solve the issue
- using redis as session storage
- using sticky session from load balanceer
- using server-sent-events
Perhaps you should split a task into subtasks across workers and save them, lets say in PostersDB before placing to the workers. Once you have split the tasks you can simply assign the tasks to workers, so there will be no need in any worker-to-worker sync in the first place? If the worker dies, you simply re-assign the task to a new worker. @cyyeh
Thanks for your suggestion. We'll think about it in terms of implementation complexity, architecture complexity, etc.
Would you be so kind in merging this PR? It needed for WrenAi to work in k8s? without this WrenAI will not be able to start in k8s @cyyeh
@qdrddr sure, William will review your PR. I am just raising this issue first to discuss with u in case we need multiple wren-ai-service pods. As I said, the tasks in wren-ai-service is a long-running process, and currently the client(browser) uses polling to get the current status of the task. However, if we are using multiple wren-ai-service pods, then we need to make sure the client can get the latest and correct status no matter what's the implementation details. So that's why I am discussing with you several candidates to solve the issue
- using redis as session storage
- using sticky session from load balanceer
- using server-sent-events
Perhaps you should split a task into subtasks across workers and save them, lets say in PostersDB before placing to the workers. Once you have split the tasks you can simply assign the tasks to workers, so there will be no need in any worker-to-worker sync in the first place? If the worker dies, you simply re-assign the task to a new worker. @cyyeh
Thanks for your suggestion. We'll think about it in terms of implementation complexity, architecture complexity, etc.
Would you be so kind in merging this PR? It needed for WrenAi to work in k8s, without this WrenAI will not be able to start in k8s? @cyyeh
How about I help you ping William in discord? Since he is responsible for k8s deployment stuff. Sorry I can't directly merge it by myself.
And we are on holidays here haha, it's a 3-day-long holidays
WORKERS required for the deploy-wren-ai-service.yaml wren-engine-svc is the correct dns name in k8s