CDLUC3 / ezid

CDLUC3 ezid
MIT License
10 stars 4 forks source link

Investigate Nagios alerts on search using OpenSearch #737

Open jsjiang opened 2 months ago

jsjiang commented 2 months ago

We received two Nagios alerts on 9/16 for the uc3-ezid-ui-search-prd_7x16 service:

Alert time:

Message:

PROBLEM alert - ezid.cdlib.org/uc3-ezid-ui-search-prd_7x16 is CRITICAL

Additional Info: HTTP CRITICAL: Status line output matched HTTP/1.1 200 OK - 16446 bytes in 9.685 second response time

Investigate the root cause.

jsjiang commented 2 months ago

Traffic and workload were not heavy on EC2 and RDS when the alerts were triggered. ELB log has request_processing_time, target_processing_time and response_processing_time for each request. It looks like the target_processing_time was long when Nagios alert was triggered. The request and response processing times were very short for all investigated cases. Here are the details:

9/16, 3:57pm (PDT) with Nagios error "fields": { "@timestamp": [ "2024-09-16T23:16:19.589Z" ], "request_creation_time": [ "2024-09-16T22:56:57.304Z" ], "timestamp": [ "2024-09-16T22:57:05.076Z" ] },

"sent_bytes": 16446, "received_bytes": 180, "request_processing_time": 0.002, "target_processing_time": 7.769, "response_processing_time": 0,

9/16, 4:17pm (PDT) with Nagios error: "fields": { "@timestamp": [ "2024-09-16T23:37:50.315Z" ], "request_creation_time": [ "2024-09-16T23:17:05.730Z" ], "timestamp": [ "2024-09-16T23:17:15.399Z" ] },

"sent_bytes": 16446, "received_bytes": 180, "request_processing_time": 0.001, "target_processing_time": 9.667, "response_processing_time": 0,

9/16, 3:44pm no Nagios error "fields": { "@timestamp": [ "2024-09-16T23:04:57.679Z" ], "request_creation_time": [ "2024-09-16T22:44:49.080Z" ], "timestamp": [ "2024-09-16T22:44:52.517Z" ] },

"sent_bytes": 16446, "received_bytes": 180, "request_processing_time": 0.002, "target_processing_time": 3.435, "response_processing_time": 0,

9/17, 4:18pm "fields": { "@timestamp": [ "2024-09-17T23:35:19.871Z" ], "request_creation_time": [ "2024-09-17T23:18:13.973Z" ], "timestamp": [ "2024-09-17T23:18:14.987Z" ] },

"sent_bytes": 16446, "received_bytes": 180, "request_processing_time": 0.001, "target_processing_time": 1.013, "response_processing_time": 0,