Open tam512 opened 1 year ago
I've tried using getting-started as base for instant on and it worked for me. I assume that once add_trace.xml
is added into the pod there should be something in the liberty log about reloading configuration. Maybe, try checking other locations for trace file such as /output
folder. Maybe it still being written but to a different folder
I don't see trace file in /output either
% oc get pods
NAME READY STATUS RESTARTS AGE
ebuy-inston-0 1/1 Running 0 3d
% oc get oltrace -o wide
NAME PODNAME TRACING REASON MESSAGE
oltrace-ebuy-inston ebuy-inston-0 True
% oc exec ebuy-inston-0 -- ls -l /config/configDropins/overrides
total 8
-rw-r--r--. 1 default root 180 Nov 10 15:17 add_trace.xml
-rw-r--r--. 1 default root 435 Nov 10 15:16 truststore.xml
% oc exec ebuy-inston-0 -- ls -l /opt/ol/wlp/output/defaultServer/
total 0
drwxrwx---. 1 default root 19 Nov 7 22:00 logs
drwxrwx---. 1 default root 22 Nov 7 21:59 resources
drwxrwx---. 1 default root 39 Nov 7 21:59 tranlog
drwxrwx---. 1 default root 100 Nov 10 15:16 workarea
% oc exec ebuy-inston-0 -- ls -lR /opt/ol/wlp/output/defaultServer/logs
/opt/ol/wlp/output/defaultServer/logs:
total 0
drwxrwx---. 2 default root 6 Nov 7 22:00 state
/opt/ol/wlp/output/defaultServer/logs/state:
total 0
% oc exec ebuy-inston-0 -- ls -lR /opt/ol/wlp/output/defaultServer/resources
/opt/ol/wlp/output/defaultServer/resources:
total 0
drwxrwx---. 1 default root 38 Nov 10 15:16 security
/opt/ol/wlp/output/defaultServer/resources/security:
total 16
-rw-rwx---. 1 default root 3612 Nov 10 15:16 key.p12
-rw-rwx---. 1 default root 897 Nov 7 21:59 ltpa.keys
-rw-r--r--. 1 default root 7094 Nov 10 15:16 trust.p12
% oc exec ebuy-inston-0 -- ls -lR /opt/ol/wlp/output/defaultServer/tranlog
/opt/ol/wlp/output/defaultServer/tranlog:
total 0
drwxrwx---. 1 default root 61 Nov 7 21:59 partnerlog
drwxrwx---. 1 default root 61 Nov 7 21:59 tranlog
/opt/ol/wlp/output/defaultServer/tranlog/partnerlog:
total 2048
-rw-rw----. 1 default root 0 Nov 7 21:59 DO NOT DELETE LOG FILES
-rw-rw----. 1 default root 1048576 Nov 7 22:00 log1
-rw-rw----. 1 default root 1048576 Nov 7 22:00 log2
/opt/ol/wlp/output/defaultServer/tranlog/tranlog:
total 2048
-rw-rw----. 1 default root 0 Nov 7 21:59 DO NOT DELETE LOG FILES
-rw-rw----. 1 default root 1048576 Nov 7 22:00 log1
-rw-rw----. 1 default root 1048576 Nov 7 22:00 log2
this problem seems to be intermittent. I can get wltrace ok on OCP 4.14 with Liberty image 24.0.0.1
% oc get wlapps
NAME IMAGE EXPOSED RECONCILED RESOURCESREADY READY AGE
ebuy-olk-j17-amdrh90 docker-na-public.artifactory.swg-devops.com/hyc-wassvt-team-image-registry-docker-local/instanton/24.0.0.1/ebuy:ol-kernel-java17-amd-rhel90 true True True True 18m
% oc get pods
NAME READY STATUS RESTARTS AGE
ebuy-olk-j17-amdrh90-0 1/1 Running 0 20m
% cat wltrace.yml
apiVersion: liberty.websphere.ibm.com/v1
kind: WebSphereLibertyTrace
metadata:
name: get-wltrace
spec:
license:
accept: true
podName: ebuy-olk-j17-amdrh90-0
traceSpecification: "*=info:com.ibm.ws.webcontainer*=all"
maxFileSize: 20
maxFiles: 5
disable: false
% oc apply -f wltrace.yml
webspherelibertytrace.liberty.websphere.ibm.com/get-wltrace created
% oc get webspherelibertytrace get-wltrace -o wide
NAME PODNAME TRACING REASON MESSAGE
get-wltrace ebuy-olk-j17-amdrh90-0 True
% oc exec ebuy-olk-j17-amdrh90-0 -- ls -l /serviceability/ebuy-olk-amdrh90/ebuy-olk-j17-amdrh90-0
total 347
-rw-r-----. 1 default root 828 Jan 10 18:38 messages.log
-rw-r-----. 1 default root 354166 Jan 10 18:39 trace.log
% oc exec ebuy-olk-j17-amdrh90-0 -- cat /serviceability/ebuy-olk-amdrh90/ebuy-olk-j17-amdrh90-0/messages.log
********************************************************************************
product = Open Liberty 24.0.0.1 (wlp-1.0.85.cl240120231230-1902)
wlp.install.dir = /opt/ol/wlp/
server.output.dir = /opt/ol/wlp/output/defaultServer/
java.home = /opt/java/openjdk
java.version = 17.0.9
java.runtime = IBM Semeru Runtime Open Edition (17.0.9+9)
os = Linux (5.14.0-284.30.1.el9_2.x86_64; amd64) (en_US)
process = 1025@8f987fde0a07
Classpath = /opt/ol/wlp/bin/tools/ws-server.jar
Java Library path = /opt/java/openjdk/lib/default:/opt/java/openjdk/lib:/usr/lib64:/usr/lib
********************************************************************************
[1/10/24, 18:38:57:081 UTC] 0000004b com.ibm.ws.config.xml.internal.ConfigRefresher A CWWKG0017I: The server configuration was successfully updated in 10.738 seconds.
.......
I see the same problem in AKS - using azurefile-premium storageclass.
[root@3fef9c46a087 ansible]# kubectl -n ebuy-wlk-j21-ub22 get WebSphereLibertyTrace -o wide
NAME PODNAME TRACING REASON MESSAGE
wl-day2op-trace ebuy-wlk-j21-ub22-0 True
[root@3fef9c46a087 ansible]# kubectl -n ebuy-wlk-j21-ub22 exec ebuy-wlk-j21-ub22-0 -- ls -l /serviceability/ebuy-wlk-j21-ub22/ebuy-wlk-j21-ub22-0
total 40692
-rwxrwxrwx 1 root root 41667864 Feb 26 16:10 2024-02-26_16:09:42.zip
[root@3fef9c46a087 ansible]#
@tam512 @rumanaHaque To determine whether this issue is also caused by the storage permission issue seen in https://github.com/OpenLiberty/open-liberty/issues/27163, try to create a file within the trace folder (/serviceability/
In the same env where I wanted to get the trace, I did successfully generate dump, and I was able to see that file created in the folder you mentioned. It was just the trace files that were missing. Here is the output for that folder.
[root@3fef9c46a087 ansible]# kubectl -n ebuy-wlk-j21-ub22 exec ebuy-wlk-j21-ub22-0 -- ls -l /serviceability/ebuy-wlk-j21-ub22/ebuy-wlk-j21-ub22-0
total 40692
-rwxrwxrwx 1 root root 41667864 Feb 26 16:10 2024-02-26_16:09:42.zip
@Rumana Haque Thanks for the quick reponse. Seems like that the dump operation worked on the same Pod where trace files failed to be created. In that case, I agree the trace issue is not storage related.
It seems that Liberty runtime is not processing the server config to enable trace.
This problem is consistently recreated on Power 9 OCP with managed-nfs-storage storageClass
% oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
managed-nfs-storage (default) fuseim.pri/ifs Delete Immediate false 20m
% oc get wltrace
NAME PODNAME TRACING
websphereliberty-trace ebuy-wlf-j21-0 True
% oc exec ebuy-wlf-j21-0 -- ls -l /serviceability/ebuy-wlo/ebuy-wlf-j21-0
total 40392
-rw-r-----. 1 default root 41360794 Mar 12 20:31 2024-03-12_20:30:41.zip
From the original defect https://github.com/WASdev/websphere-liberty-operator/issues/540#issuecomment-1806083513 , we have narrowed down that this problem only happens with InstantOn. It was asked to collect config=all trace, and I was be able to collect the trace logs2-trace.zip
@tam512 I've updated the issue's title to reflect the actual problem.
@hlhoots Liberty runtime is not processing the server config to enable trace. Is that handled by the kernel? If not, do you know who should be engaged to further investigate? Thank you
fyi @tjwatson
What is the content of the /config/configDropins/overrides/add_trace.xml
file that gets added by the operator?
The content of _/config/configDropins/overrides/addtrace.xml was in the defect description:
% oc exec ebuy-olf-j11-0 -- cat /config/configDropins/overrides/add_trace.xml
<server><logging traceSpecification="*=info:com.ibm.ws.webcontainer*=all" logDirectory="/serviceability/ebuy-j11-full-amdub22-olo/ebuy-olf-j11-0" maxFileSize="20" maxFiles="5"/></server>
I was able to collect the trace with fileMonitor enabled as requested.
It may be easiest to place it in your bootstrap.properties with:
com.ibm.ws.logging.trace.specification=*=info:fileMonitor=all
Here is the zip file which includes this trace. logs_26908.zip
This issue was opened in operator repo https://github.com/WASdev/websphere-liberty-operator/issues/540, but it was not operators problem so open this new one
Enabled trace using OpenLibertyTrace but trace.log is not generated. This problem happens with InstantOn app images only.
The Liberty Operator is adding the expected server config (/config/configDropins/overrides/add_trace.xml), but it’s not processed by the Liberty runtime.
Below is the steps to recreate the problem