HPA and limits - Githubissues

w0oh0o commented 4 years ago

Hello guys,

I've got some questions abous those yamls. 1) Why there are HPA examples since the HPA for itself doesnt really work because we also have to scale up the shards, right? Or am I missing something?
2) What are the recomendations for CPU limits? I was thinking about: coreapps 750m CPU 400Mi MEM web 750m CPU 400Mi MEM Does it make sense? I've notice that the pods under stress may reach 900m / 600Mi (for coreapps) and 850m / 400Mi for webs, no HPA for both core and webs.

Thank you,

thiagocmoraes commented 4 years ago

Hey there, can you say what HPA means? Regarding the shards, you're correct that you need to manually scale on the current solution.

Hardware requirements are listed in https://developers.facebook.com/docs/whatsapp/faq#faq_379475409145572

We have some additional experiments about load on https://developers.facebook.com/docs/whatsapp/tips-and-tricks/send-message-performance

stephan-higuti-movile commented 4 years ago

Hello Thiago,

HPA means Horizontal Pod Autoscaler. Its a kubernetes object. I read that page already, but Facebook suggests to use those containers as Instance. Dont really understand why theres is no proper documentation of how to use those images running in kubernetes. :(

Thank you

Stephan

mengyiyuan commented 4 years ago

@stephan-higuti-movile Have you checked our doc on k8s here? https://developers.facebook.com/docs/whatsapp/installation/dev-multiconnect-minikube

What k8s platform are you running with in production?

w0oh0o commented 4 years ago

Hello @mengyiyuan , thank you for your time. We've been running on GKE and AKS. I did check those page, but there is more kind dev environment. For production we did struggle but i guess we found the best requests and limits values. I just got confused because I saw the HPA for coreapps and then I wanted to know if there was a better way do scale up coreapps. Thanks mate.

mengyiyuan commented 4 years ago

Hi Stephan,

I am curious to know what request and limit values you are using now for the containers. I am planning to work on a production doc, so your contribution would be really valuable to me.

Currently k8s only helps to scale the number of coreapp containers running, but it doesn't automatically scale the messaging throughput. For that, you need to make an additional request to the set-shard API: https://developers.facebook.com/docs/whatsapp/multiconnect_mc/#setup

w0oh0o commented 4 years ago

Hello @mengyiyuan ,

I've made some stress tests to check how the application would perform under stress. I found out the best setup for the way we use is:

For single: I'd strongly recommend to apply a anti-affinity rules between the web and coreapp from the same customer. Webs and cores: memory: "200Mi" cpu: "10m" limits: memory: "1200Mi" cpu: "800m"

For multi, I've noticed that the best way to distribute the load was to change the web controller to daemonset (instead of deployment or statefulset), so I ensure that the webserver traffic will be close to the same to all nodes. resources: requests: memory: "500Mi" cpu: "200m" limits: memory: "1200Mi" cpu: "500m"

For multi i decided to give less resources since they have more pods. You can see i chose to keep the memory limits high, this is because k8s acts different depending in which limit you reach, if you reach cpu limit, k8s will throttle the cpu, however if we reach memory limit k8s will just kill the pod, so thats a limit we should really be careful. Those are the limits that we are using in production and its working and performing really well. I guess the values would be different from case to case, the most important thing is to set the limits. Yes, the coreapps scale doesn't work clean as it should, I hope one day this extra step for scaling out gets removed, to make all our lives easier, hahaha!

Please, feel free to contact me!

Thank you,

Stephan

mengyiyuan commented 4 years ago

Hi Stephan,

This is greatly valuable, thank you so much for sharing them. Several follow-up questions:

Is the purpose for anti-affinity rules between the web and coreapp to keep high availability?
Using daemonset for webapp is a very innovative idea! Do you see a significant improvement in performance when a web server is on all nodes?
About auto-scaling, I also hope we can come up with something soon! What do you do now? Do you have any auto-scaling workaround or scale up only when client asks?

w0oh0o commented 4 years ago

Hello mengyiyuan,

Is the purpose for anti-affinity rules between the web and coreapp to keep high availability? No, the idea is to distribute the load better, a big load for certain customer that have their web and their coreapp on the same node could put that node in danger. But yes, we could use the same concept to have more "spares" coreapps. (I only did this for singles)

Using daemonset for webapp is a very innovative idea! Do you see a significant improvement in performance when a web server is on all nodes? (i did dis for multis) Yes, I found out that my webs were kinda bottleneck under heavy traffic, so I did this and instantly solved my problem.

About auto-scaling, I also hope we can come up with something soon! What do you do now? Do you have any auto-scaling workaround or scale up only when client asks? For now we have an idea of how big our customers are, so we can scale as we go, and in some cases we pre-scale the customer. I could think in a way to trigger the curl to resize the shards as well but this is not a big problem right now!

I've got a question, I've been testing and testing the application and I think the corepps are kinda safe in case of crashing, do u know about this? If a coreapp dies while sending a message from the queue, it has a retry right? Im design the application to work on different nodepools now if this is the case!!

mengyiyuan commented 4 years ago

Hi Stephan,

Thanks again for sharing your learnings, it is great to see what you have figured out!

Sorry for my late reply to your question, I'm actually not sure about this, let me check with the team and get back to you soonest.

w0oh0o commented 4 years ago

Hello mengyiyuan,

I think I already know the answer, but I think coreapps are failsafe if the messages are already stored in the queue. When you have coreapps problems, I'm not sure why but the web starts returning errors, so, If you have coreapps down u may face problems in your API, but if you already received all mesages and they are in the queue, you can kill or restart or whatever the coreapps that the messages will be delivered.

Happy 2020!

mengyiyuan commented 4 years ago

Hi Stephan,

Happy new year! Sorry again for my late reply. You are right that web containers will return error when the system is in an unhealthy state. I guess this is more for hiding the container orchestration details from the API because web containers are responsible for responding to API calls. You need to retry sending outbound messages if you receive errors when sending them.

For inbound messages, WA servers will retry sending messages to Coreapp once they are back online.

Closing the ticket for now. Happy with our discussion!

WhatsApp / WhatsApp-Business-API-Setup-Scripts

HPA and limits #12