Open QWQyyy opened 1 year ago
this is my helm chart value.yaml file:
@style95 Boss, I would like to ask you how to configure concurrency correctly. At present, as long as we use the -c
parameter to configure an Action concurrency greater than 1, a large number of errors will occur.
The performance is highly related to the duration of your action. Let's say, your action takes 1 second to finish and you have 200 containers. Then your TPS will be just 200 TPS. If your action completes in 100ms, your TPS will be 2000 TPS.
And you said, your machines have 32Gb of memory but you configured around 65GB of memory for runtime containers. It will exceed the limit of your machine and might cause OOM. Since core components like invokers should be running as well, you need to configure less than 32GB for runtime containers. And take a look at the duration of your activations.
You can refer to this guide for intra-concurrency. There are still some known issues with it though.
The performance is highly related to the duration of your action. Let's say, your action takes 1 second to finish and you have 200 containers. Then your TPS will be just 200 TPS. If your action completes in 100ms, your TPS will be 2000 TPS.
And you said, your machines have 32Gb of memory but you configured around 65GB of memory for runtime containers. It will exceed the limit of your machine and might cause OOM. Since core components like invokers should be running as well, you need to configure less than 32GB for runtime containers. And take a look at the duration of your activations.
You can refer to this guide for intra-concurrency. There are still some known issues with it though.
Sorry, I have 3 nodes, so actually I have 32*3GB=96GB memory available, in K8s 2 work nodes have 64GB, and we also config it pool as 48GB, But it look like not good, Although the execution time of Action is very short, only 100ms,etc: But the result is very confusing, My K6 test very unstable and poor performance, this is 300uvs and get 200TPS: We also find some internal error:
But, I appreciate your comments, I will study the guide in the link carefully, I think I should turn to nodejs functions to support concurrent access, thanks! In the future, I will pay more attention to how to improve the TPS of OpenWhisk and solve some existing problems, which will be very helpful to academia and industry.
ok, so you are running only one invoker. Since the invoker communicates with runtime containers, it's generally better to have more than one invoker in terms of performance. And the userMemory config applies per invoker, so if you add more invokers, you need to decrease the value.
And I can see the "internal error" in activations, do you have any logs regarding that? And it seems the error has changed from "developer error" to "internal error".
On the whole, most hot starts are successful, but a small number of continuous internal errors:
It seems that the container failed to start or the scheduling failed. I seem to have disabled the activation log:
I use kubectl to check the pod's log, and it turns out that the node is overwhelmed, but in fact, I set the kubelet to have 400 pods per node, but the scheduling still cannot be completed.
errors message is Pod event for failed pod 'wskowdev-invoker-00-1481-prewarm-nodejs16' null: 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Too many pods.
、Pod event for failed pod 'wskowdev-invoker-00-1484-prewarm-nodejs16' null: Successfully assigned openwhisk/wskowdev-invoker-00-1484-prewarm-nodejs16 to work2
and failed starting prewarm, removed
So this is not an OpenWhisk issue. Your K8S cluster does not have enough resources.
So this is not an OpenWhisk issue. Your K8S cluster does not have enough resources.
Thank you, I need update my cluster!
We hardware is 12vCPU and 32GB RAM K8s cluster, 3 nodes. My problem is that whenever I try to configure the container concurrency, such as setting the
-c
parameter to 2, 5, 10, 50, etc., but now it seems that OpenWhisk can only work correctly when the concurrency is 1. The specific performance is that the error of preheating the container directly occurs in the function call, as follows: And the test program also shows a large number of timeouts, and this test runs normally when-c
is configured as 1, I don’t know how to set up concurrency: I have configured value.yaml according to the suggestion you gave in the previous issue, as follows: At the same time, I also found an obvious problem, that is, when the number of preheated containers reaches 200, it is difficult to configure upwards, and there will be an error that the api gateway resource cannot be found. This is in the previous issue.My current difficulty is that it is difficult to use OpenWhisk to cope with a large amount of load. The data I currently get is only 200TPS, which is a very bad value. My function is very simple, just remotely calling a python3 runtime function of a redis database. Could you give me some advice?