CDCgov / trusted-intermediary

Bringing together healthcare providers by reducing the connection burden.
Apache License 2.0
11 stars 5 forks source link

Update 4XX Alert to Ignore RS Receiver Status Check #1519

Closed halprin closed 3 weeks ago

halprin commented 3 weeks ago

Description

Our 4XX alert was constantly triggering due to RS (and other things) calling non-existent endpoints. Changed the alert to be log based and filtering out the pre-live slot, the RS receiver status check, and Microsoft pinging non-existent endpoints.

Issue

1517

Checklist

github-actions[bot] commented 3 weeks ago

PR Reviewer Guide ๐Ÿ”

Here are some key observations to aid the review process:

**๐ŸŽซ Ticket compliance analysis ๐Ÿ”ถ** **[1517](https://github.com/CDCgov/trusted-intermediary/issues/1517) - Partially compliant** Fully compliant requirements: - Investigate what is triggering 4xx alerts in staging. - Test fix in TI Internal. Not compliant requirements: - Merge PR with final fix.
โฑ๏ธ Estimated effort to review: 3 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšชโšช
๐Ÿงช No relevant tests
๐Ÿ”’ No security concerns identified
โšก Recommended focus areas for review

Query Logic
Ensure the query logic effectively filters out the intended 4xx errors without excluding other potential critical alerts.
github-actions[bot] commented 3 weeks ago

PR Code Suggestions โœจ

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Possible issue
Refine the log filtering conditions to ensure necessary logs are not excluded ___ **Ensure that the query does not inadvertently exclude necessary logs by refining the
conditions that filter out specific methods and URIs. Consider using more specific
conditions or adding additional URIs that might be relevant to ignore.** [operations/template/alert.tf [142-148]](https://github.com/CDCgov/trusted-intermediary/pull/1519/files#diff-ea7b0de5fd253c48860400c7536561ab82d76414012b803eeb3a2cb75d8a439aR142-R148) ```diff +query = <<-QUERY + AppServiceHTTPLogs | + where CsHost !contains "pre-live" | + where UserAgent != "AlwaysOn" | + where not(CsMethod == "GET" and CsUriStem startswith "/v1/etor/") | // ignore RS receiver status check that uses GET + where not(CsMethod == "GET" and CsUriStem startswith "/admin/") | // ignore Microsoft hitting non-existent URLs + where ScStatus >= 400 and ScStatus < 500 + QUERY - ```
Suggestion importance[1-10]: 5 Why: The suggestion to refine log filtering conditions is relevant as it ensures important logs are not excluded, which is crucial for accurate monitoring. However, the improved code provided is identical to the existing code, indicating no specific changes were suggested.
5
Performance
Optimize the alert's frequency and time window for effective monitoring ___ **Consider adjusting the frequency and time_window parameters to optimize the alert's
responsiveness and accuracy, ensuring it effectively captures relevant data without
causing unnecessary noise.** [operations/template/alert.tf [152-153]](https://github.com/CDCgov/trusted-intermediary/pull/1519/files#diff-ea7b0de5fd253c48860400c7536561ab82d76414012b803eeb3a2cb75d8a439aR152-R153) ```diff +frequency = 5 +time_window = 60 - ```
Suggestion importance[1-10]: 4 Why: Optimizing the frequency and time window can enhance the alert system's effectiveness. However, the suggestion lacks specific improvements or alternatives, making it less impactful.
4
Maintainability
Ensure all necessary tags are included in the ignore_changes lifecycle configuration ___ **Review the ignore_changes lifecycle configuration to ensure it includes all
necessary tags that should not trigger a redeployment or update of the resource,
potentially adding more tags as needed.** [operations/template/alert.tf [162-167]](https://github.com/CDCgov/trusted-intermediary/pull/1519/files#diff-ea7b0de5fd253c48860400c7536561ab82d76414012b803eeb3a2cb75d8a439aR162-R167) ```diff +ignore_changes = [ + tags["business_steward"], + tags["center"], + tags["environment"], + tags["escid"], - ```
Suggestion importance[1-10]: 4 Why: Ensuring all necessary tags are included in the `ignore_changes` configuration is important for maintainability. However, the suggestion does not provide specific tags to add, limiting its practical applicability.
4
Best practice
Check and adjust the severity level of the alert to match organizational standards ___ **Verify the severity level set for the alert to ensure it aligns with organizational
standards for 4xx error monitoring, potentially adjusting it if necessary.** [operations/template/alert.tf [151]](https://github.com/CDCgov/trusted-intermediary/pull/1519/files#diff-ea7b0de5fd253c48860400c7536561ab82d76414012b803eeb3a2cb75d8a439aR151-R151) ```diff +severity = 3 - ```
Suggestion importance[1-10]: 3 Why: This suggestion is a good practice to ensure the severity level aligns with organizational standards. However, it's more of a verification step rather than a direct code improvement, hence the moderate score.
3
sonarcloud[bot] commented 3 weeks ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud