We're alerting when requests take more than 200ms but that's right about the expected response time for an API->OpenSearch request/response cycle. So bumping that up to 500ms as what would represent a problem. The lower threshold leads to alert fatigue and will mask any real performance issues that require investigation.
Acceptance criteria
Alarms no longer fire when the time taken is below 500ms.
Summary
We're alerting when requests take more than 200ms but that's right about the expected response time for an API->OpenSearch request/response cycle. So bumping that up to 500ms as what would represent a problem. The lower threshold leads to alert fatigue and will mask any real performance issues that require investigation.
Acceptance criteria