Scalable Deployment of OpenWhisk 2.0.0

singhsegv commented 1 month ago

I am very confused about how the OpenWhisk 2.0.0 is meant to be deployed for a scalable bechmarking setting. Need some help from the maintainers to understand what am I missing since I've spent a large amount of time now and still missing some key pieces.

Context

We are using OpenWhisk for a research project where workflows (sequential as well as fork/join) are to be deployed and benchmarked at 1/4/8 RPS etc for long period of times. This is to compare private cloud FaaS vs public cloud FaaS.

Current Infrastructure Setting

We have a in-house cluster with around 10 VMs running on different nodes, 50 vCPUs and around 200Gb of memory. Since I am new to this, I've initially followed https://github.com/apache/openwhisk-deploy-kube to deploy it and along with OpenWhisk Composer, was able to get the workflows running with a lot of small fixes and changes.

Problems with Current Infrastructure

I am not able to scale it properly. Running even 1 RPS for 5-10 minutes leads to a lot of random errors like "failed to get binary" and some other errors too that don't occur when running a workflow once manually.
Even when I reduce the benchmarking time to 10-20s, the inter-action comm time is coming out to be around 1.5-2 minutes. Grafana shows around 2 minutes going in the /init and I am unable to debug why that is happening.

Main Doubts about scaling

Since that openwhisk-deploy-kube take a very old version of OpenWhisk, so I thought running the latest version of it without k8s and on a single machine might give some benefits. But what I've understood now is the standalone mode is not supposed to be scalable since the controller is responsible for a lot of things in v1.0.0 and haven't checked that in 2.0.0.
Since, deploy-kube doesn't support the latest version of OpenWhisk due to major changes in scheduler, how is OpenWhisk supposed to be deployed for a scalable infrastructure? Is there some documentation that I've missed?
Also in 1.0.0, the results that I've got, is there something that I am missing? Why aren't the workflows scaling? How to go about debugging the delay? Or is purely that more infra needs to be added?

@style95 @dgrove-oss Since you people have been active in the community and have answered some of my previous queries too, any help on this will be very appreciated.

We are planning to go all in with OpenWhisk for our research and planning to contribute some good changes back to the community relating to FaaS at edge and improving the communication times in FaaS. But since none of us have infrastructure as our strong suite, getting over these initial hiccups is a becoming a blocker for us. So looking forward to some help, thanks :).

style95 commented 1 month ago

@singhsegv

I just want to ask you a couple of questions first. How did you deploy OpenWhisk-2.0.0? Are you able to invoke an action? What is your concurrency limit and userMemory assigned to invokers?

singhsegv commented 1 month ago

Hey @style95 thank for taking this up

How did you deploy OpenWhisk-2.0.0? Are you able to invoke an action? I haven't deployed this till now. I've been working with OpenWhisk 1.0.0 which is the one that comes with https://github.com/apache/openwhisk-deploy-kube repository. I've raised an issue there to understand how to get OpenWhisk 2.0.0 up and running with k8s where you gave some pointers https://github.com/apache/openwhisk-deploy-kube/issues/781#issuecomment-2316569110 . I'll be working on getting this up in the meantime.

So this question is something that is part of my doubts. As in, if I want to deploy OpenWhisk 2.0.0 in a multi node setting in a scalable manner, how should I go about it? Is ansible a way for that, or is there some way to user kubernetes for this?

What is your concurrency limit and userMemory assigned to invokers? My action concurrency limits are 1 and there are 3 invokers each with ~20000m of memory. After checking in grafana dashboard this seems to be fine since each invoker showed 40 pods of 512mb memory each at peak.

Some more updates from the benchmarking I did in the meantime:

Instead of benchmarking the scale of workflows, I went back to load testing 1 action at a time. I had a graph processing workflow in which let's say there were 2 actions A and B.
Benchmarking A with multiple RPS made it scale well. I tried with 1/4/8/16 RPS for 5 minutes each and it reused warm containers and got stable soon.
But when I ran benchmarking for another action B after A, the warm containers were not used. Even though both A and B use same runtimes, same amount of memory and same requirements for pip.

So I think the warm containers not being used amongst different actions is what's causing my workflows to not scale. I saw invoker tags based scheduling in the docs and that is some temporary fix for my use case, that is in 2.0.0 and not 1.0.0.

But bigger concern is my limited understanding of warm containers reuse amongst different actions. Where do I get more information about this? Is this the intentional way warm containers reuse is supposed to run?

bdoyle0182 commented 1 month ago

But when I ran benchmarking for another action B after A, the warm containers were not used. Even though both A and B use same runtimes, same amount of memory and same requirements for pip.

So I think the warm containers not being used amongst different actions is what's causing my workflows to not scale. I saw invoker tags based scheduling in the docs and that is some temporary fix for my use case, that is in 2.0.0 and not 1.0.0.

But bigger concern is my limited understanding of warm containers reuse amongst different actions. Where do I get more information about this? Is this the intentional way warm containers reuse is supposed to run?

You're experiencing the hot spotting / container swapping problem of the best effort 1.0.0 scheduling algorithm. If your container pool is full and no functions exist for Action B, you need to evict an Action A in order to cold start an Action B. But also want to clarify that you shouldn't expect containers to get reused for multiple actions. Once a container is bound to an action, it can only run executions of that action even if it's using the same runtime and memory profile; there are many reasons for this but most importantly is security and data isolation.

You will find that performance should be significantly better on Openwhisk 2.0 with the new scheduler for the traffic pattern you're trying to test.

dgrove-oss commented 1 month ago

I think @bdoyle0182 comment hits the root of the confusion. OpenWhisk will never reuse a container that was running action A to now run action B, even if A and B are actions of the same user that used the same runtime+memory combination.

There is a related concept of a stem cell container (the default configuration is here https://github.com/apache/openwhisk/blob/master/ansible/files/runtimes.json#L44-L56), where if there is unused capacity the system tries to hide container creation latency by keeping a few unused containers for popular runtimes+memory combinations up and running into which it can inject the code for a function on its first invocation. But once the code is injected, these containers are bound to a specific function and will never be used for anything else.

singhsegv commented 1 month ago

Indeed @dgrove-oss, the explanation by @bdoyle0182 made my understanding about the underlying problem in my benchmarking technique and warm container re-use much clear. Thanks a lot @bdoyle0182.

Circling back to my main question back again, is setting up OpenWhisk 2.0.0 on a kubernetes cluster for benchmarking is a good way forward? Or are there any other well tested scalable ways for multi-node deployment of the whole stack. I've some experience with Ansible but haven't used ansible with a goal of multi-node clustering.

Since I've realized that OpenWhisk 2.0.0 comes with a lot of improvements and worth spending time into instead of writing hackish fixes into version 1 for my use cases, I am trying to get the helm chart to support 2.0.0 as this should help others looking to run the latest version too.

dgrove-oss commented 1 month ago

Sorry to be slow; I thought I had responded but actually hadn't.

I'll have to defer to others (@style95 @bdoyle0182) to comment about how they are deploying OpenWhisk 2.0. From a community standpoint, I think it would be great if we could get the helm chart updated. It's unfortunately a couple of years out-of-date, so it may take some effort to update version dependencies, etc.

style95 commented 1 month ago

Yes, that’s my long-overdue task. I’ve just started looking into it but haven’t completed it yet. I see some areas for improvement, but I’m still struggling to set up my own Kubernetes environment first. Since I’m involved in this project purely out of personal interest, it’s taking more time than I expected. @singhsegv, if you could help update the chart, that would be great. I think I can assist with your work as well.

singhsegv commented 1 month ago

Hey @style95 I've started working on it. Planning to update and do sanity testing for non openwhisk images first like redis, kafka, etc. Then I'll move on to controller, invoker and other openwhisk related images. Kinda stuck with a paper deadlines, but expecting to make a PR soon and then iterate over it.

style95 commented 1 month ago

@singhsegv Great! Feel free to reach out to me on Slack.

apache / openwhisk