fecgov / fecfile-web-api

Back-end API for FECfile application
Other
8 stars 2 forks source link

Set up Pingdom to ping celery and database #649

Closed mjtravers closed 1 month ago

mjtravers commented 10 months ago

There is a risk that the celery workers can fail silently. Pingdom will be set up to periodically hit a /celery-test/ endpoint that will indicate if the celery service is up and running.

Additionally, Pingdom alert should be set up to hit an endpoint that, after running a test query against the database, report back whether the database connection and query were successful or not.

DEV NOTES

There is already an enpoint set up as "/celery-test" that is set up in the fecfiler/urls.py file. The function for this test can be updated if necessary to adapt to whatever Pingdom needs to see to verify the service is up.

To let the user know that the db is up without letting them fire off a db task without logging in, we should monitor the state of the database on the api, store that state and return it on the ping endpoint

Acceptance Criteria

A message is automatically sent from Pingdom to the Slack #fecfile-alerts channel when the Celery web-service is not responding for DEV, STAGE, or PROD

QA Notes

To verify, you can look at 2 URLs in the browser and see if a valid response comes back:

[https://dev-api.fecfile.fec.gov/devops/celery-status/|https://dev-api.fecfile.fec.gov/devops/celery-status/]

should display

!image-20240827-174144.png|width=326,height=66,alt="image-20240827-174144.png"!

[https://dev-api.fecfile.fec.gov/devops/database-status/|https://dev-api.fecfile.fec.gov/devops/celery-status/]

should display

!image-20240827-174236.png|width=326,height=66,alt="image-20240827-174236.png"!

DEV Notes

null

Design

null

See full ticket and images here: FECFILE-243

lbeaufort commented 9 months ago

@mjtravers I'd also like to consider changing the Pingdom alerts for the API to also run queries rather than hitting the status page. It seems beneficial to have a generic test query endpoint like the one for celery. If you agree, we could add it to this ticket or create a separate one.

mjtravers commented 9 months ago

@mjtravers I'd also like to consider changing the Pingdom alerts for the API to also run queries rather than hitting the status page. It seems beneficial to have a generic test query endpoint like the one for celery. If you agree, we could add it to this ticket or create a separate one.

Yes, that sounds good to me. What kind of query? Are we talking about a custom api endpoint that returns just some python generated response or a query to the database, too?

mjtravers commented 9 months ago

@lbeaufort Added a database query check for pingdom in this ticket and increased the points accordingly.

AureliaKhorsand commented 4 months ago

@MitchellTCG why is this on hold? Can we add that to a comment in this ticket?

exalate-issue-sync[bot] commented 2 months ago

Passes CR. Sending to QA.

exalate-issue-sync[bot] commented 2 months ago

Shelly Wise commented: QA review verified per DEV when selecting the following URL the following valid response comes back:

[https://dev-api.fecfile.fec.gov/devops/celery-status/|https://dev-api.fecfile.fec.gov/devops/celery-status/]

!image-20240827-180528.png|width=650,height=186,alt="image-20240827-180528.png"! See image in Jira

QA review also verified, when selecting the following URL the following valid response comes back:

[https://dev-api.fecfile.fec.gov/devops/database-status/|https://dev-api.fecfile.fec.gov/devops/celery-status/]

!image-20240827-180707.png|width=650,height=186,alt="image-20240827-180707.png"!

QA Review Completed. Moved to Stage Ready.

exalate-issue-sync[bot] commented 2 months ago

[~accountid:712020:169a1b29-e3ab-43ca-a22d-7d6f230207bd] -We’ve discovered an issue on DEV with this ticket. The bit of code that is managing the pingdom url and keeping it from getting a DOS attack is adversely affecting the Celery tasks from launching correctly. We’re rolling this change back out of the sprint and moving it to Sprint 48 for further work.-

We were able to figure out the code issue and all looks good. Moving this ticket back to QA for another look same as before.

exalate-issue-sync[bot] commented 2 months ago

Shelly Wise commented: QA retested per DEV findings. QA review verified per DEV when selecting the following URL the following valid response comes back:

[https://dev-api.fecfile.fec.gov/devops/celery-status/|https://dev-api.fecfile.fec.gov/devops/celery-status/]

!image-20240827-223137.png|width=50%,alt="image-20240827-223137.png"! See image in Jira

QA review also verified, when selecting the following URL the following valid response comes back:

[https://dev-api.fecfile.fec.gov/devops/database-status/|https://dev-api.fecfile.fec.gov/devops/celery-status/]

!image-20240828-122050.png|width=502,height=150,alt="image-20240828-122050.png"!

QA Review Completed. Moved to Stage Ready.

exalate-issue-sync[bot] commented 1 month ago

Automation for Jira commented: Sprint accepted by Paul Clark at Sprint Review on comment date.