We need an application-level monitoring component to track our various tools and services to ensure we know when our services are down, and provide sufficient notification, as well as validate our uptime requirements.
Some thoughts on components we need to monitor:
Prod and Staging Websites - return 200
OpenSearch Registry - up and contains expected data
Registry API - postman (maybe some subset of queries?)
DOI Service - up and returns data for query (postman?)
Nucleus - up and running, pipelines are not stuck?
💡 Description
We need an application-level monitoring component to track our various tools and services to ensure we know when our services are down, and provide sufficient notification, as well as validate our uptime requirements.
Some thoughts on components we need to monitor:
Ideas for technology choices: