apache / openwhisk

Apache OpenWhisk is an open source serverless cloud platform
https://openwhisk.apache.org/
Apache License 2.0
6.56k stars 1.17k forks source link

Add Scheduler Queue Metric for Not Processing Any Activations #5386

Closed bdoyle0182 closed 1 year ago

bdoyle0182 commented 1 year ago

Description

The memory queue will now track the last time that an activation has been pulled by a grpc request to send to an invoker. If an activation is dropped from the queue from aging out and there hasn't been a single activation grabbed by an invoker for the entire duration that the activation sat in the queue, then this gauge will fire. This is needed as a fail safe to be made aware to any edge cases in the system where etcd data gets out of sync and containers are thought to exist by the scheduler that do not exist. Simply looking at activations timing out is not enough to determine an issue since the action may just be hitting throttling limits for number of containers.

So this should fire in only one case:

  1. There's a bug in the scheduler that needs a restart of either the invokers and schedulers to get the queue back into a healthy state.

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

codecov-commenter commented 1 year ago

Codecov Report

Merging #5386 (62d7eb0) into master (96ff327) will decrease coverage by 72.13%. The diff coverage is 0.00%.

:exclamation: Current head 62d7eb0 differs from pull request most recent head 830da9c. Consider uploading reports for the commit 830da9c to get more accurate results

@@            Coverage Diff             @@
##           master   #5386       +/-   ##
==========================================
- Coverage   76.65%   4.52%   -72.13%     
==========================================
  Files         240     240               
  Lines       14574   14588       +14     
  Branches      646     629       -17     
==========================================
- Hits        11171     660    -10511     
- Misses       3403   13928    +10525     
Impacted Files Coverage Δ
...in/scala/org/apache/openwhisk/common/Logging.scala 40.08% <0.00%> (-39.66%) :arrow_down:
...e/openwhisk/core/scheduler/queue/MemoryQueue.scala 0.00% <0.00%> (-81.55%) :arrow_down:
.../main/scala/org/apache/openwhisk/core/WarmUp.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...ain/scala/org/apache/openwhisk/spi/SpiLoader.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...n/scala/org/apache/openwhisk/utils/JsHelpers.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...scala/org/apache/openwhisk/common/Prometheus.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...scala/org/apache/openwhisk/common/time/Clock.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...scala/org/apache/openwhisk/core/FeatureFlags.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...scala/org/apache/openwhisk/http/CorsSettings.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...ala/org/apache/openwhisk/common/ConfigMXBean.scala 0.00% <0.00%> (-100.00%) :arrow_down:
... and 196 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more