CSTARS / spatial-cimis

New repository for the DWR Spatial CIMIS program
MIT License
0 stars 1 forks source link

DWR Monitoring, Alerting and Issue Resolution Strategy #48

Open gjscheer-ucd opened 5 years ago

gjscheer-ucd commented 5 years ago

DWR Monitoring, Alerting and Issue Resolution Strategy

Goals

Enable DWR to monitor, identify and rectify most if not all of the DWR GOES17 data issues.

Justification

Because of the significantly larger data size and frequency of GOES17 data as compared to GOES15, data processing for Spatial CIMIS introduces significantly higher probability for data corruption. It is for this fact that a premature promotion of the DWR GOES17 Spatial CIMIS processes to production / live status will unnecessarily put the team (DWR & UCD) on endless alert potentially introducing delays in providing ETo data to customers.

What is the strategy?

Prior to promoting the DWR GOES17 Spatial CIMIS processes to production / live status, DWR should be able to demonstrate the ability to go 2 weeks without a major processing issue while being able to adequately address live data delivery issues in a timely manner in order to avoid data loss and an interruption to their ETo delivery responsibilities.

To accomplish this the following strategy should be considered:

Specific resources to monitor

AppDynamics can monitor and provide basic host alert information such as general availability, CPU RAM & disk usage. In addition to general availability alerts these are the specific services that need monitoring with alerts.

CIMIS grb-box

CIMIS processor - test

CIMIS processor -prod

Requested Strategy

This strategy requires DWR firewall rules to allow remote monitoring service to access ports 22, 80 and 443 for the following Spatial CIMIS servers are required:

source IP port(s) protocol destination IP host
see whitelist 22,80,443 TCP see "Monitoring" in request log dev
see whitelist 22,80,443 TCP see "Monitoring" in request log testing
see whitelist 22,80,443 TCP see "Monitoring" in request log prod
see whitelist 22,80,443 TCP see "Monitoring" in request log W Sac dsp-box
see whitelist 22,80,443 TCP see "Monitoring" in request log W Sac grb-box

Must know public facing IPs for all spatial cimis servers.

Remote monitoring service is Uptime Robot. IP's to white list are listed here: https://uptimerobot.com/locations.php https://uptimerobot.com/inc/files/ips/IPv4.txt