CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

Inventory socket timeout Nagios alerts - update jvm memory #1930

Closed terrywbrady closed 4 weeks ago

terrywbrady commented 1 month ago

Ashley's analysis suggests that increasing the JVM size will affect the swap issue spikes we see. We will increase the JVM on prod. Inventory from 1GB to 1.5GB on Monday and the reboot after patching on Tuesday will allow it to take effect.

elopatin-uc3 commented 4 weeks ago

Suggest testing on Stage by taking one inv host out of the load balancer, requesting to alter the other inv host to a larger instance type, and testing with the largest manifest we have to see where incremental increases in memory no longer help. Then we can determine what an optimal jvm memory configuration is.

mreyescdl commented 4 weeks ago

Increased Inventory JVM for prod to 1.5GB Will take effect after Tuesday 6/4 patching

dloy commented 4 weeks ago

A brief analysis logs

Some things to note with these exceptions:

Comments:

Notification Type: PROBLEM

Service: uc3-mrt-inventory-prd_state_7x16 Host: uc3-mrtinv-prd01 Address: uc3-mrtinv-prd01.cdlib.org State: CRITICAL

Date/Time: Tue Jun 4 10:40:21 PDT 2024

Additional Info: HTTP CRITICAL: Status line output matched HTTP/1.1 200 - 986 bytes in 5.684 second response time

Nagios

Notification Type: PROBLEM

Service: uc3-mrt-inventory-prd_status-running_7x16 Host: uc3-mrtinv-prd01 Address: uc3-mrtinv-prd01.cdlib.org State: CRITICAL

Date/Time: Tue Jun 4 10:41:07 PDT 2024

Additional Info: HTTP CRITICAL: HTTP/1.1 200 - 986 bytes in 9.534 second response time

Nagios

Notification Type: RECOVERY

Service: uc3-mrt-inventory-prd_state_7x16 Host: uc3-mrtinv-prd01 Address: uc3-mrtinv-prd01.cdlib.org State: OK

Date/Time: Tue Jun 4 11:00:21 PDT 2024


less localhost_access_log.2024-06-04.txt

172.30.32.237 - - [04/Jun/2024:10:40:21 -0700] "GET /state?t=xml HTTP/1.1" 200 747 172.31.14.167 - - [04/Jun/2024:10:40:21 -0700] "GET /state HTTP/1.1" 200 855

172.31.14.167 - - [04/Jun/2024:10:41:07 -0700] "GET /state HTTP/1.1" 200 855

172.31.14.167 - - [04/Jun/2024:11:00:21 -0700] "GET /state HTTP/1.1" 200 855 172.30.32.237 - - [04/Jun/2024:11:00:21 -0700] "GET /state?t=xml HTTP/1.1" 200 747