Closed mjtravers closed 1 month ago
@mjtravers I'd also like to consider changing the Pingdom alerts for the API to also run queries rather than hitting the status page. It seems beneficial to have a generic test query endpoint like the one for celery. If you agree, we could add it to this ticket or create a separate one.
@mjtravers I'd also like to consider changing the Pingdom alerts for the API to also run queries rather than hitting the status page. It seems beneficial to have a generic test query endpoint like the one for celery. If you agree, we could add it to this ticket or create a separate one.
Yes, that sounds good to me. What kind of query? Are we talking about a custom api endpoint that returns just some python generated response or a query to the database, too?
@lbeaufort Added a database query check for pingdom in this ticket and increased the points accordingly.
@MitchellTCG why is this on hold? Can we add that to a comment in this ticket?
Passes CR. Sending to QA.
Shelly Wise commented: QA review verified per DEV when selecting the following URL the following valid response comes back:
!image-20240827-180528.png|width=650,height=186,alt="image-20240827-180528.png"! See image in Jira
QA review also verified, when selecting the following URL the following valid response comes back:
!image-20240827-180707.png|width=650,height=186,alt="image-20240827-180707.png"!
QA Review Completed. Moved to Stage Ready.
[~accountid:712020:169a1b29-e3ab-43ca-a22d-7d6f230207bd] -We’ve discovered an issue on DEV with this ticket. The bit of code that is managing the pingdom url and keeping it from getting a DOS attack is adversely affecting the Celery tasks from launching correctly. We’re rolling this change back out of the sprint and moving it to Sprint 48 for further work.-
We were able to figure out the code issue and all looks good. Moving this ticket back to QA for another look same as before.
Shelly Wise commented: QA retested per DEV findings. QA review verified per DEV when selecting the following URL the following valid response comes back:
!image-20240827-223137.png|width=50%,alt="image-20240827-223137.png"! See image in Jira
QA review also verified, when selecting the following URL the following valid response comes back:
!image-20240828-122050.png|width=502,height=150,alt="image-20240828-122050.png"!
QA Review Completed. Moved to Stage Ready.
Automation for Jira commented: Sprint accepted by Paul Clark at Sprint Review on comment date.
There is a risk that the celery workers can fail silently. Pingdom will be set up to periodically hit a /celery-test/ endpoint that will indicate if the celery service is up and running.
Additionally, Pingdom alert should be set up to hit an endpoint that, after running a test query against the database, report back whether the database connection and query were successful or not.
DEV NOTES
There is already an enpoint set up as "/celery-test" that is set up in the fecfiler/urls.py file. The function for this test can be updated if necessary to adapt to whatever Pingdom needs to see to verify the service is up.
To let the user know that the db is up without letting them fire off a db task without logging in, we should monitor the state of the database on the api, store that state and return it on the ping endpoint
Acceptance Criteria
A message is automatically sent from Pingdom to the Slack #fecfile-alerts channel when the Celery web-service is not responding for DEV, STAGE, or PROD
QA Notes
To verify, you can look at 2 URLs in the browser and see if a valid response comes back:
[https://dev-api.fecfile.fec.gov/devops/celery-status/|https://dev-api.fecfile.fec.gov/devops/celery-status/]
should display
!image-20240827-174144.png|width=326,height=66,alt="image-20240827-174144.png"!
[https://dev-api.fecfile.fec.gov/devops/database-status/|https://dev-api.fecfile.fec.gov/devops/celery-status/]
should display
!image-20240827-174236.png|width=326,height=66,alt="image-20240827-174236.png"!
DEV Notes
null
Design
null
See full ticket and images here: FECFILE-243