SVT: pod restarts using the 9.0.5.2-ubi8 or 9.0.5.3.

hanczaryk commented 4 years ago

I had briefly chatted with Glenn Marcy on slack who pointed me to open the issue I've observed here.

I have successfully been creating my docker image for a while using 9.0.5.2 with the line below. FROM ibmcom/websphere-traditional:9.0.5.2-ubi

I'm able to successfully build and deploy this image. Here is the output from 'oc get pods' showing that it has been up and running for over an hour with no restarts. It serves app requests at https://hanczar-b2-hanczar.apps.9.46.124.17.nip.io/B/

[root@avl-rhos1 tWAS_B]# oc get pods NAME READY STATUS RESTARTS AGE hanczar-b2-1-tgcjf 1/1 Running 0 1h

Today, I attempted to use the new 9.0.5.3 image that I saw was added to https://hub.docker.com/r/ibmcom/websphere-traditional/tags with the line below FROM ibmcom/websphere-traditional:9.0.5.3-ubi

I was able to build and deploy that image. It successfully served app requests at https://hanczar-b3-hanczar.apps.9.46.124.17.nip.io/B/ but I noticed that a large number of java processes were being started on my OCP 31 system.

After about 5 minutes, I had over 150 java processes that reference LogViewer like the one I've shown below.

1000310+ 66558 66537 1 11:17 ? 00:00:06 /opt/IBM/WebSphere/AppServer/java/8.0/bin/java -Dcom.ibm.CORBA.ConfigURL=file:/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/properties/sas.client.props -Dwas.install.root=/opt/IBM/WebSphere/AppServer -Duser.install.root=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01 -Dlog.repository.root=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/logs -Dws.ext.dirs=/opt/IBM/WebSphere/AppServer/java/8.0/lib:/opt/IBM/WebSphere/AppServer/classes:/opt/IBM/WebSphere/AppServer/lib:/opt/IBM/WebSphere/AppServer/installedChannels:/opt/IBM/WebSphere/AppServer/lib/ext:/opt/IBM/WebSphere/AppServer/web/help:/opt/IBM/WebSphere/AppServer/deploytool/itp/plugins/com.ibm.etools.ejbdeploy/runtime -Djava.endorsed.dirs=/opt/IBM/WebSphere/AppServer/endorsed_apis:/opt/IBM/WebSphere/AppServer/java/8.0/jre/lib/endorsed -Djava.ext.dirs=/opt/IBM/WebSphere/AppServer/javaext:/opt/IBM/WebSphere/AppServer/java/8.0/lib/ext:/opt/IBM/WebSphere/AppServer/java/8.0/jre/lib/ext -Xmx64M -Djava.util.logging.manager=com.ibm.ws.bootstrap.WsLogManager -Djava.util.logging.configureByServer=true -Dlogviewer.custom.header=/opt/IBM/WebSphere/AppServer/properties/WsHeader -Dlogviewer.custom.levels=/opt/IBM/WebSphere/AppServer/properties/WsLevels.properties -classpath /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/properties:/opt/IBM/WebSphere/AppServer/properties:/opt/IBM/WebSphere/AppServer/lib/startup.jar:/opt/IBM/WebSphere/AppServer/lib/bootstrap.jar:/opt/IBM/WebSphere/AppServer/java/8.0/lib/tools.jar:/opt/IBM/WebSphere/AppServer/lib/lmproxy.jar:/opt/IBM/WebSphere/AppServer/lib/urlprotocols.jar com.ibm.ws.bootstrap.WSLauncher com.ibm.ws.logging.hpel.viewer.LogViewer -monitor 1 -resumable -resume -format json

This growth in java processes would eat up all the free memory as shown by top until the pod restarted, in this case it was after about 7 minutes. Once the pod restarted, the java processes were removed, the pod again served app requests until the java process growth began again.

Glenn mentioned that 9.0.5.3 uses ubi8 and that I should attempt with 9.0.5.2-ubi8 to see if it also occurred.

So, I attempted to use the 9.0.5.2-ubi8 image with the line below FROM ibmcom/websphere-traditional:9.0.5.2-ubi8

I encountered the same experience as 9.0.5.3. An increase in java processes and the pod restarted. The following 'oc get pods' shows that it restarted the pod a few times.

[root@avl-rhos1 tWAS_B]# oc get pods NAME READY STATUS RESTARTS AGE hanczar-b2-1-tgcjf 1/1 Running 0 1h hanczar-b28-1-z7cwx 1/1 Running 2 45m

Here is the top output head top - 12:38:58 up 87 days, 21:50, 1 user, load average: 9.91, 13.40, 21.87 Tasks: 847 total, 2 running, 845 sleeping, 0 stopped, 0 zombie %Cpu(s): 15.7 us, 17.7 sy, 0.3 ni, 66.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem : 16266524 total, 284260 free, 13549604 used, 2432660 buff/cache KiB Swap: 0 total, 0 free, 0 used. 2350900 avail Mem

At this point in time there were 163 java processes with 157 of them referencing LogViewer shown above.

I deleted the hanczar-28 application and observed the following.

The number of java processes dropped to 7 with only 1 referencing LogViewer.
The amount of free memory increased from ~280K to ~6.5GB.

donbourne commented 4 years ago

looks like it's due to 2 commands not being available on the image.

during container startup the following message appears once:

/work/start_server.sh: line 61: cmp: command not found

then repeatedly, while the image is running:

/work/start_server.sh: line 93: ps: command not found

the script that's emitting these messages is https://github.com/WASdev/ci.docker.websphere-traditional/blob/master/docker-build/9.0.5.2/scripts/start_server.sh the script relies on ps to see if logViewer is up, and will start a new copy if it thinks it is down...

https://github.com/WASdev/ci.docker.websphere-traditional/blob/5fca22eafab210c3dc48f845c7f44750a022fd13/docker-build/9.0.5.2/scripts/start_server.sh#L92-L98

So, I think the problem is that the newer images don't have the cmp and ps commands. Either those need to be added to the image or the start_server.sh script needs to be changed to not rely on them.

hanczaryk commented 4 years ago

New images were updated on ibmcom/websphere-traditional about 16 hours ago. I just tried the 9.0.5.2-ubi8 image and the bug no longer exists where 150+ java processes are present.

hanczaryk commented 4 years ago

Also, I just validated that the using the new 9.0.5.3-ubi image, the bug no longer exists. As such, I'll close this issue.

WASdev / ci.docker.websphere-traditional

SVT: pod restarts using the 9.0.5.2-ubi8 or 9.0.5.3. #210