Closed ckelner closed 9 years ago
First order of business will be put ice on larger (more memory) servers.
Second order of business is to build some logic around making sure the processor daemon stays running (I thought the linux 'service' pattern was supposed to make sure of this, but need to do some reading and probably pick @clstokes brain for help)
Third order of business, look at west coast server.
Same story on west coast server, no indication of why it died.
I am rolling out m3.larges in place of c3.larges. The m3s have twice as much memory as the c3s.
If for some reason that doesn't suffice, will look at moving to r3.large
memory snapshot from the west-2 server this morning:
free -m
total used free shared buffers cached
Mem: 3641 3197 444 0 113 354
-/+ buffers/cache: 2729 912
Swap: 815 362 453
and the processes themselves:
ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 31698 0.1 26.1 5485080 973932 ? SNl 01:13 1:17 java -Xmx2G -Xms2048m -XX:MaxPermSize=2048m -jar ice-processor.jar port=1234
tomcat 1707 0.1 44.3 5653424 1654456 ? Sl Jan23 12:58 /usr/lib/jvm/java/bin/java -Djavax.sql.DataSource.Factory=org.apache.commons.dbcp.BasicDataSourceFactory -Xmx2G -Xms2048...
If -Xmx2G
was the same setting as before, then increasing the memory of the box isn't going to change anything. If memory is the problem, we need to increase the max memory of the JVM with that flag (ie. -Xmx4G
)
Let's also add -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
to the script starting the JVM so we can see it's memory usage.
@clstokes the box only has 3.75 Gb to give and two processes both consuming upwards of 2G. Does your statement still hold true given that?
I'll add those flags. I'm working on a script to push memory info into cloudwatch so we can get a better picture.
I should say, the box only had 3.75Gb (c3.large) -- now running m3.large (7.5Gb memory).
I didn't realize there were two 2G processes. Were both processes dead?
The max memory flag would still need to be increased if memory is indeed our problem.
No, only one process was dead. Tomcat was still running but the process we daemonized (seen here: https://github.com/TheWeatherCompany/grid-config-mgmt/blob/master/provisioners/puppet/modules/grid-ice/files/ice-processor) was dead but I had no indication of why. The log just stopped. The daemon script reported: Process dead but pidfile exists
@clstokes suggested we use boundary to monitor the memory. Example usage for config-mgmt here: https://github.com/TheWeatherCompany/grid-config-mgmt/blob/master/provisioners/puppet/roles/grid/prod-grid-console-web-east-runtime.pp#L154-L158
Memory seems to be slowing creeping up
I've got boundary alarms in place to alert on the memory once it reaches a certain threshold.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
tomcat 1807 0.2 26.5 5656172 2001568 ? Sl Jan29 7:22 /usr/lib/jvm/java/bin/java -Djavax.sql.DataSource.Factory=org.apache.commons.dbcp.BasicDataSourceFactory -Xmx2G -Xms2048
root 1829 0.3 30.6 5475700 2308772 ? SNl Jan29 9:27 java -Xmx2G -Xms2048m -XX:MaxPermSize=2048m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -jar ice-processor.ja
Looks like they've settled in around 65% ish for the time being.
We've added boundary HTTP status check as a means of monitoring the process as well. Everything has been stable for awhile now.
Contacted by Landon that latest data was not there.
Logged into server and checked service status for ice_processor, was not running:
Checked log and found it had no errors, last message was Jan 25:
No indication of why.
I started the processor again:
Disk space looks ok:
Not a whole lot of memory though: