apache / openwhisk

Apache OpenWhisk is an open source serverless cloud platform
https://openwhisk.apache.org/
Apache License 2.0
6.49k stars 1.16k forks source link

health actions can reuse containers and may not flag an unhealthy invoker #2884

Open rabbah opened 6 years ago

rabbah commented 6 years ago

There are scenarios for which the health actions will not correctly flag an unhealthy invoker:

  1. health action reuses a previous container
  2. health action is allocated a stem cell container

In either of these cases, if the invoker is actually unhealthy and cannot create a new container, the health action will provide a false indicator of healthiness - causing the invoker state to flop between health and unhealthy when in fact docker is already sick or the vm is sick.

The invoker should know that it is unhealthy in these cases, and could suspend its pings and generate an alert instead. This will cause new requests to be reassigned until the health is properly restored. In another way, the pings could carry various error counts: cloudant, kafka, docker and that could be used to take an invoker topic offline. If the errors are more explicitly communicated, I wonder if we still need the health action tests apart from an initial prologue/warm up of the invoker, or when the invoker topic doesn't have any messages already in queue.

Alternatively, the health actions must not be allowed to reuse a warm container since otherwise they're just misleading.

FYI @cbickel @sven-lange-last

rabbah commented 6 years ago

With multiple active controllers and no sharding, a status bit flip will cause multiple controllers to send health actions. Is that right @cbickel?

cbickel commented 6 years ago

Yes, that's correct.