Closed yolabingo closed 2 months ago
remove multithreading from system probes
The system probe code is prematurely optimized - it was trying to use circuit breakers and threadpools when building the dotCMS "am I alive" responses.
We just need to return if the system is up and working, without any of the multithreaded craziness.
@bryanboza @josemejias11 the feedback/QA for this one will come from cloud-eng. cc: @yolabingo
We discussed adding a Postman check of the happy path here.
These endpoints are used to confirm the health of dotCMS containers using curl or k8s httpGet. The health checks send a plain GET request to these endpoints. The health check confirms the HTTP response code is 200.
Filing a new issue to add the Postman tests
The status for the http://localhost:8080/api/v1/probes/alive and http://localhost:8080/api/v1/probes/startup endpoints is returned as expected:
SYSTEM_STATUS_API_IP_ACL
property: 403 HTTP Status is returned.Fixed, tested on trunk // Postman
Postman test added on card: #29267
Note: this is rather critical, as the current healthcheck config for k8s spams the dotcms log.
Problem Statement
We switched most cloud servers to use
/api/v1/probes/startup
and/api/v1/probes/alive
for healthcheck endpoints. We observed that on some servers, these endpoints continually returned HTTP 503, even after the application was healthy and responsive. We also saw in the logsThis impacted only a handful of environments, but the problem persisted for these. Anecdotally we saw it on 23.01 only. We reverted the healthchecks to other api endpoints to work around the issue.
Steps to Reproduce
unknown
Acceptance Criteria
healthcheck APIs are accurate
dotCMS Version
We saw it only on 23.01, not sure if it impacts other versions
Proposed Objective
Reliability
Proposed Priority
Priority 2 - Important
External Links... Slack Conversations, Support Tickets, Figma Designs, etc.
https://dotcms.slack.com/archives/G5VQBQ4H0/p1712369356322479
Assumptions & Initiation Needs
No response
Quality Assurance Notes & Workarounds
No response
Sub-Tasks & Estimates
No response