Allow concurrent activation execution, in some cases

tysonnorris commented 7 years ago

When creating an action, it would be nice to be able to annotate it as -a concurrent <max concurrent activations> to indicate that concurrent activation executions are allowed in the same container, up to some maximum. This would, for example, allow web actions to be more scalable, instead of limited to the number of concurrently running containers. This would only be allowed for actions/users that are repeatedly invoked - parameters may be unique per activation, but the action+user would have to be the same for concurrent execution.

This would complicate:

log collection (log would include interleaved messages from multiple activations, which may be ok)
pause/unpause after activation completion (don't pause till there are no more activations running)

rabbah commented 7 years ago

This would be a radical departure from the current model and comes with some significant baggage. I'll add to the list of complications:

what does guaranteeing a CPU share mean if multiple instances of an action are running concurrently in the same container?
similarly how do any of the limits now apply with respect to memory, io, network, diskspace?
how do you save the user from themselves when their code is racy, or uses the file system incorrectly, what does it mean for debugging, or performance isolation?

If someone were to try - it's certainly doable (the very first whisk prototypes did this) - I can't yet imagine how it would be deployed in a production environment. I can imagine how with some program analysis, it's possible to identify actions for which intra-container concurrency may be plausible. The annotation is the early escape clause to "release" the platform from undefined, or unintended behavior.

The serveless promise is that the platform scales to handle any given load. Before going down this road to increase density, might it be more prudent to explore leaner isolation technologies instead?

tysonnorris commented 7 years ago

@rabbah I'm not sure this baggage is different than what a developer would deal with if, instead of use a serverless approach, was to package there code into a node container, and run it that way? That is, when you run a container as a traditional web service, you have to gauge cpu/memory requirements in line with traffic and usage characteristics. Instead of analysis to infer concurrent activation support, I would expect making this more explicit, like based on a different function name (main() vs mainConcurrent()) or annotation or ? would be better, but maybe there are cases where inferred behavior would also be useful.

I look at it as one additional knob for tuning resource utilization, where you can gain some benefits of serverless (scaling is done, but now based on reaching max allowed concurrency), and you also gain some benefits of increasing resource efficiency (in some cases functions may be just as happy serving 200 (or 2000+) concurrent activations as a single activation). Tuning cpu shares for concurrent requests support may be too coarse grained of an adjustment for many situations.

Another approach would be to avoid statically defining concurrency, and instead allow actions to define some backpressure function (in addition to "main()"), so that there can be some more custom logic applied to the concurrency limits. (e.g. backpressure based on concurrent downstream data usage, but not for function-managed internal cache hits)

michaelmarth commented 7 years ago

My 2c on this: the implicit assumption here is that concurrency implemented through Node's EventLoop is vastly more efficient than concurrency by running containers in parallel. Where "vastly" is large enough to justify looking into the complexity stated above. I think in order to have a meaningful discussion on that we will need a set of shared performance/throughput test scripts (maintained within OW) and results.

tysonnorris commented 7 years ago

FYI running perf tests in docker-compose env via ab load test against helloworld.js with #2060 using -a concurrent-max 1000 I get:

Percentage of the requests served within a certain time (ms)
  50%    567
  66%    630
  75%    678
  80%    707
  90%    790
  95%    881
  98%   1010
  99%   1124
 100%   1420 (longest request)

compare to without using the concurrent-max annotation:

Percentage of the requests served within a certain time (ms)
  50%   1653
  66%   1819
  75%   1931
  80%   2005
  90%   2330
  95%   2584
  98%   2932
  99%   4060
 100%   7674 (longest request)

I'll submit a separate PR to enable rerunning these tests easily in openwhisk-devtools

rabbah commented 7 years ago

a suggestion: rerun on master but suppress pause/unpause entirely; from past results you can get 2x or more from that alone.

michaelmarth commented 7 years ago

@tysonnorris thanks for sharing these numbers. Interesting! Made me wonder how these would change (or not) when the action is i/o-bound - e.g. the weather app from IBM's Bluemix tutorial (which IMHO is quite representative of typical OW actions):



function main(params) {
   var location = params.location || 'Vermont';
   var url = 'https://query.yahooapis.com/v1/public/yql?q=select item.condition from weather.forecast where woeid in (select woeid from geo.places(1) where text="' + location + '")&format=json';

   return new Promise(function(resolve, reject) {
       request.get(url, function(error, response, body) {
           if (error) {
               reject(error);
           }
           else {
               var condition = JSON.parse(body).query.results.channel.item.condition;
               var text = condition.text;
               var temperature = condition.temp;
               var output = 'It is ' + temperature + ' degrees in ' + location + ' and ' + text;
               resolve({msg: output});
           }
       });
   });
}`

michaelmarth commented 7 years ago

@rabbah, as you mention above this issue is about a rather fundamental design aspect. Do you think it makes sense to discuss this on the mailing list to get broader attention? It would be interesting to understand the history better (why early prototypes implemented this approach and why it was abandoned). Also, I think it would be a good discussion to look at use cases other than public cloud (i.e. private OW deployments). Reason I mention that: that might render moot some of the complexity arguments above. Really don't know the outcome of such a discussion - but I think it's really worth having it and have it go "on record", because I believe this topic is likely to pop up again.

tysonnorris commented 7 years ago

@rabbah thanks - yes tried that originally, and its better but still not good performance.

No-pause + without concurrent-max annotation: Much better but still not great performance.

Percentage of the requests served within a certain time (ms)
  50%    584
  66%    654
  75%    712
  80%    749
  90%    843
  95%    929
  98%   1438
  99%   2785
 100%   6727 (longest request)

No-pause + with concurrent-max annotation: Slightly worse than above, possibly due to log file parsing issues that come up with concurrent requests - I will take a look at that.

Percentage of the requests served within a certain time (ms)
  50%    625
  66%    700
  75%    759
  80%    800
  90%    919
  95%   1082
  98%   1579
  99%   1767
 100%   2301 (longest request)

apache / openwhisk

Allow concurrent activation execution, in some cases #2026