Open tysonnorris opened 7 years ago
This would be a radical departure from the current model and comes with some significant baggage. I'll add to the list of complications:
If someone were to try - it's certainly doable (the very first whisk prototypes did this) - I can't yet imagine how it would be deployed in a production environment. I can imagine how with some program analysis, it's possible to identify actions for which intra-container concurrency may be plausible. The annotation is the early escape clause to "release" the platform from undefined, or unintended behavior.
The serveless promise is that the platform scales to handle any given load. Before going down this road to increase density, might it be more prudent to explore leaner isolation technologies instead?
@rabbah I'm not sure this baggage is different than what a developer would deal with if, instead of use a serverless approach, was to package there code into a node container, and run it that way? That is, when you run a container as a traditional web service, you have to gauge cpu/memory requirements in line with traffic and usage characteristics. Instead of analysis to infer concurrent activation support, I would expect making this more explicit, like based on a different function name (main() vs mainConcurrent()) or annotation or ? would be better, but maybe there are cases where inferred behavior would also be useful.
I look at it as one additional knob for tuning resource utilization, where you can gain some benefits of serverless (scaling is done, but now based on reaching max allowed concurrency), and you also gain some benefits of increasing resource efficiency (in some cases functions may be just as happy serving 200 (or 2000+) concurrent activations as a single activation). Tuning cpu shares for concurrent requests support may be too coarse grained of an adjustment for many situations.
Another approach would be to avoid statically defining concurrency, and instead allow actions to define some backpressure function (in addition to "main()"), so that there can be some more custom logic applied to the concurrency limits. (e.g. backpressure based on concurrent downstream data usage, but not for function-managed internal cache hits)
My 2c on this: the implicit assumption here is that concurrency implemented through Node's EventLoop is vastly more efficient than concurrency by running containers in parallel. Where "vastly" is large enough to justify looking into the complexity stated above. I think in order to have a meaningful discussion on that we will need a set of shared performance/throughput test scripts (maintained within OW) and results.
FYI running perf tests in docker-compose env via ab load test against helloworld.js with #2060 using -a concurrent-max 1000 I get:
Percentage of the requests served within a certain time (ms)
50% 567
66% 630
75% 678
80% 707
90% 790
95% 881
98% 1010
99% 1124
100% 1420 (longest request)
compare to without using the concurrent-max annotation:
Percentage of the requests served within a certain time (ms)
50% 1653
66% 1819
75% 1931
80% 2005
90% 2330
95% 2584
98% 2932
99% 4060
100% 7674 (longest request)
I'll submit a separate PR to enable rerunning these tests easily in openwhisk-devtools
a suggestion: rerun on master but suppress pause/unpause entirely; from past results you can get 2x or more from that alone.
@tysonnorris thanks for sharing these numbers. Interesting! Made me wonder how these would change (or not) when the action is i/o-bound - e.g. the weather app from IBM's Bluemix tutorial (which IMHO is quite representative of typical OW actions):
function main(params) {
var location = params.location || 'Vermont';
var url = 'https://query.yahooapis.com/v1/public/yql?q=select item.condition from weather.forecast where woeid in (select woeid from geo.places(1) where text="' + location + '")&format=json';
return new Promise(function(resolve, reject) {
request.get(url, function(error, response, body) {
if (error) {
reject(error);
}
else {
var condition = JSON.parse(body).query.results.channel.item.condition;
var text = condition.text;
var temperature = condition.temp;
var output = 'It is ' + temperature + ' degrees in ' + location + ' and ' + text;
resolve({msg: output});
}
});
});
}`
@rabbah, as you mention above this issue is about a rather fundamental design aspect. Do you think it makes sense to discuss this on the mailing list to get broader attention? It would be interesting to understand the history better (why early prototypes implemented this approach and why it was abandoned). Also, I think it would be a good discussion to look at use cases other than public cloud (i.e. private OW deployments). Reason I mention that: that might render moot some of the complexity arguments above. Really don't know the outcome of such a discussion - but I think it's really worth having it and have it go "on record", because I believe this topic is likely to pop up again.
@rabbah thanks - yes tried that originally, and its better but still not good performance.
No-pause + without concurrent-max annotation: Much better but still not great performance.
Percentage of the requests served within a certain time (ms)
50% 584
66% 654
75% 712
80% 749
90% 843
95% 929
98% 1438
99% 2785
100% 6727 (longest request)
No-pause + with concurrent-max annotation: Slightly worse than above, possibly due to log file parsing issues that come up with concurrent requests - I will take a look at that.
Percentage of the requests served within a certain time (ms)
50% 625
66% 700
75% 759
80% 800
90% 919
95% 1082
98% 1579
99% 1767
100% 2301 (longest request)
When creating an action, it would be nice to be able to annotate it as
-a concurrent <max concurrent activations>
to indicate that concurrent activation executions are allowed in the same container, up to some maximum. This would, for example, allow web actions to be more scalable, instead of limited to the number of concurrently running containers. This would only be allowed for actions/users that are repeatedly invoked - parameters may be unique per activation, but the action+user would have to be the same for concurrent execution.This would complicate: