OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 592 forks source link

Server config to enable trace is not processed for InstantOn apps #26908

Open tam512 opened 1 year ago

tam512 commented 1 year ago

This issue was opened in operator repo https://github.com/WASdev/websphere-liberty-operator/issues/540, but it was not operators problem so open this new one

Enabled trace using OpenLibertyTrace but trace.log is not generated. This problem happens with InstantOn app images only.
The Liberty Operator is adding the expected server config (/config/configDropins/overrides/add_trace.xml), but it’s not processed by the Liberty runtime.

Below is the steps to recreate the problem

- Deploy app using Open Liberty Operator 1.3.0
% oc get olapps -n ebuy-j11-full-amdub22-olo
NAME           IMAGE                                                                                                                                                                                          EXPOSED   RECONCILED   RESOURCESREADY   READY   AGE
ebuy-olf-j11   docker-na-public.artifactory.swg-devops.com/hyc-wassvt-team-image-registry-docker-local/instanton/liberty-23.0.0.11-gm/23.0.0.11-full-java11-openj9-ubi-amd64-ol/ebuy/afterappstart:amd-ub22   true      True         True             True    23h

% oc get pods -n ebuy-j11-full-amdub22-olo
NAME             READY   STATUS    RESTARTS   AGE
ebuy-olf-j11-0   1/1     Running   0          67m

% cat oltrace.yml         
apiVersion: apps.openliberty.io/v1
kind: OpenLibertyTrace
metadata:
  name: getoltrace
spec:
  maxFileSize: 20
  maxFiles: 5
  podName: ebuy-olf-j11-0
  traceSpecification: '*=info:com.ibm.ws.webcontainer*=all'

% oc apply -f oltrace.yml 
openlibertytrace.apps.openliberty.io/getoltrace created

% oc get openlibertytrace -o wide
NAME         PODNAME          TRACING   REASON   MESSAGE
getoltrace   ebuy-olf-j11-0   True 

% oc exec ebuy-olf-j11-0 -- ls -l /serviceability/ebuy-j11-full-amdub22-olo/ebuy-olf-j11-0/
total 38743
-rw-rw----. 1 default 1000730000 39672160 Nov  8 22:21 2023-11-08_22:21:02.zip
%

 % oc exec ebuy-olf-j11-0 -- cat /config/configDropins/overrides/add_trace.xml
<server><logging traceSpecification="*=info:com.ibm.ws.webcontainer*=all" logDirectory="/serviceability/ebuy-j11-full-amdub22-olo/ebuy-olf-j11-0" maxFileSize="20" maxFiles="5"/></server>
arturdzm commented 12 months ago

I've tried using getting-started as base for instant on and it worked for me. I assume that once add_trace.xml is added into the pod there should be something in the liberty log about reloading configuration. Maybe, try checking other locations for trace file such as /output folder. Maybe it still being written but to a different folder

tam512 commented 12 months ago

I don't see trace file in /output either

% oc get pods
NAME            READY   STATUS    RESTARTS   AGE
ebuy-inston-0   1/1     Running   0          3d

% oc get oltrace -o wide
NAME                  PODNAME         TRACING   REASON   MESSAGE
oltrace-ebuy-inston   ebuy-inston-0   True   

% oc exec ebuy-inston-0 -- ls -l /config/configDropins/overrides
total 8
-rw-r--r--. 1 default root 180 Nov 10 15:17 add_trace.xml
-rw-r--r--. 1 default root 435 Nov 10 15:16 truststore.xml

% oc exec ebuy-inston-0 -- ls -l /opt/ol/wlp/output/defaultServer/ 
total 0
drwxrwx---. 1 default root  19 Nov  7 22:00 logs
drwxrwx---. 1 default root  22 Nov  7 21:59 resources
drwxrwx---. 1 default root  39 Nov  7 21:59 tranlog
drwxrwx---. 1 default root 100 Nov 10 15:16 workarea

% oc exec ebuy-inston-0 -- ls -lR /opt/ol/wlp/output/defaultServer/logs
/opt/ol/wlp/output/defaultServer/logs:
total 0
drwxrwx---. 2 default root 6 Nov  7 22:00 state

/opt/ol/wlp/output/defaultServer/logs/state:
total 0

% oc exec ebuy-inston-0 -- ls -lR /opt/ol/wlp/output/defaultServer/resources
/opt/ol/wlp/output/defaultServer/resources:
total 0
drwxrwx---. 1 default root 38 Nov 10 15:16 security

/opt/ol/wlp/output/defaultServer/resources/security:
total 16
-rw-rwx---. 1 default root 3612 Nov 10 15:16 key.p12
-rw-rwx---. 1 default root  897 Nov  7 21:59 ltpa.keys
-rw-r--r--. 1 default root 7094 Nov 10 15:16 trust.p12

% oc exec ebuy-inston-0 -- ls -lR /opt/ol/wlp/output/defaultServer/tranlog  
/opt/ol/wlp/output/defaultServer/tranlog:
total 0
drwxrwx---. 1 default root 61 Nov  7 21:59 partnerlog
drwxrwx---. 1 default root 61 Nov  7 21:59 tranlog

/opt/ol/wlp/output/defaultServer/tranlog/partnerlog:
total 2048
-rw-rw----. 1 default root       0 Nov  7 21:59 DO NOT DELETE LOG FILES
-rw-rw----. 1 default root 1048576 Nov  7 22:00 log1
-rw-rw----. 1 default root 1048576 Nov  7 22:00 log2

/opt/ol/wlp/output/defaultServer/tranlog/tranlog:
total 2048
-rw-rw----. 1 default root       0 Nov  7 21:59 DO NOT DELETE LOG FILES
-rw-rw----. 1 default root 1048576 Nov  7 22:00 log1
-rw-rw----. 1 default root 1048576 Nov  7 22:00 log2
tam512 commented 10 months ago

this problem seems to be intermittent. I can get wltrace ok on OCP 4.14 with Liberty image 24.0.0.1

% oc get wlapps
NAME                   IMAGE                                                                                                                                         EXPOSED   RECONCILED   RESOURCESREADY   READY   AGE
ebuy-olk-j17-amdrh90   docker-na-public.artifactory.swg-devops.com/hyc-wassvt-team-image-registry-docker-local/instanton/24.0.0.1/ebuy:ol-kernel-java17-amd-rhel90   true      True         True             True    18m

% oc get pods
NAME                     READY   STATUS    RESTARTS   AGE
ebuy-olk-j17-amdrh90-0   1/1     Running   0          20m

% cat wltrace.yml 
apiVersion: liberty.websphere.ibm.com/v1
kind: WebSphereLibertyTrace
metadata:
  name: get-wltrace
spec:
  license:
    accept: true
  podName: ebuy-olk-j17-amdrh90-0
  traceSpecification: "*=info:com.ibm.ws.webcontainer*=all"
  maxFileSize: 20
  maxFiles: 5
  disable: false

% oc apply -f wltrace.yml 
webspherelibertytrace.liberty.websphere.ibm.com/get-wltrace created

% oc get webspherelibertytrace get-wltrace -o wide
NAME          PODNAME                  TRACING   REASON   MESSAGE
get-wltrace   ebuy-olk-j17-amdrh90-0   True               

% oc exec ebuy-olk-j17-amdrh90-0 -- ls -l /serviceability/ebuy-olk-amdrh90/ebuy-olk-j17-amdrh90-0
total 347
-rw-r-----. 1 default root    828 Jan 10 18:38 messages.log
-rw-r-----. 1 default root 354166 Jan 10 18:39 trace.log

% oc exec ebuy-olk-j17-amdrh90-0 -- cat /serviceability/ebuy-olk-amdrh90/ebuy-olk-j17-amdrh90-0/messages.log
********************************************************************************
product = Open Liberty 24.0.0.1 (wlp-1.0.85.cl240120231230-1902)
wlp.install.dir = /opt/ol/wlp/
server.output.dir = /opt/ol/wlp/output/defaultServer/
java.home = /opt/java/openjdk
java.version = 17.0.9
java.runtime = IBM Semeru Runtime Open Edition (17.0.9+9)
os = Linux (5.14.0-284.30.1.el9_2.x86_64; amd64) (en_US)
process = 1025@8f987fde0a07
Classpath = /opt/ol/wlp/bin/tools/ws-server.jar
Java Library path = /opt/java/openjdk/lib/default:/opt/java/openjdk/lib:/usr/lib64:/usr/lib
********************************************************************************
[1/10/24, 18:38:57:081 UTC] 0000004b com.ibm.ws.config.xml.internal.ConfigRefresher               A CWWKG0017I: The server configuration was successfully updated in 10.738 seconds.
.......
rumanaHaque commented 8 months ago

I see the same problem in AKS - using azurefile-premium storageclass.

[root@3fef9c46a087 ansible]# kubectl -n ebuy-wlk-j21-ub22 get WebSphereLibertyTrace -o wide
NAME              PODNAME               TRACING   REASON   MESSAGE
wl-day2op-trace   ebuy-wlk-j21-ub22-0   True
[root@3fef9c46a087 ansible]# kubectl -n ebuy-wlk-j21-ub22 exec ebuy-wlk-j21-ub22-0 -- ls -l /serviceability/ebuy-wlk-j21-ub22/ebuy-wlk-j21-ub22-0
total 40692
-rwxrwxrwx 1 root root 41667864 Feb 26 16:10 2024-02-26_16:09:42.zip
[root@3fef9c46a087 ansible]#
leochr commented 8 months ago

@tam512 @rumanaHaque To determine whether this issue is also caused by the storage permission issue seen in https://github.com/OpenLiberty/open-liberty/issues/27163, try to create a file within the trace folder (/serviceability//). If the file can be created where the trace files are supposed to be created then this is a different problem (not a storage permission issue).

rumanaHaque commented 8 months ago

In the same env where I wanted to get the trace, I did successfully generate dump, and I was able to see that file created in the folder you mentioned. It was just the trace files that were missing. Here is the output for that folder.

[root@3fef9c46a087 ansible]# kubectl -n ebuy-wlk-j21-ub22 exec ebuy-wlk-j21-ub22-0 -- ls -l /serviceability/ebuy-wlk-j21-ub22/ebuy-wlk-j21-ub22-0
total 40692
-rwxrwxrwx 1 root root 41667864 Feb 26 16:10 2024-02-26_16:09:42.zip
leochr commented 8 months ago

@Rumana Haque Thanks for the quick reponse. Seems like that the dump operation worked on the same Pod where trace files failed to be created. In that case, I agree the trace issue is not storage related.

It seems that Liberty runtime is not processing the server config to enable trace.

tam512 commented 8 months ago

This problem is consistently recreated on Power 9 OCP with managed-nfs-storage storageClass

% oc get sc
NAME                            PROVISIONER      RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
managed-nfs-storage (default)   fuseim.pri/ifs   Delete          Immediate           false                  20m

% oc get wltrace 
NAME                     PODNAME          TRACING
websphereliberty-trace   ebuy-wlf-j21-0   True

% oc exec ebuy-wlf-j21-0 -- ls -l /serviceability/ebuy-wlo/ebuy-wlf-j21-0                           
total 40392
-rw-r-----. 1 default root 41360794 Mar 12 20:31 2024-03-12_20:30:41.zip
tam512 commented 8 months ago

From the original defect https://github.com/WASdev/websphere-liberty-operator/issues/540#issuecomment-1806083513 , we have narrowed down that this problem only happens with InstantOn. It was asked to collect config=all trace, and I was be able to collect the trace logs2-trace.zip

leochr commented 7 months ago

@tam512 I've updated the issue's title to reflect the actual problem.

@hlhoots Liberty runtime is not processing the server config to enable trace. Is that handled by the kernel? If not, do you know who should be engaged to further investigate? Thank you

fyi @tjwatson

tjwatson commented 7 months ago

What is the content of the /config/configDropins/overrides/add_trace.xml file that gets added by the operator?

tam512 commented 7 months ago

The content of _/config/configDropins/overrides/addtrace.xml was in the defect description:

% oc exec ebuy-olf-j11-0 -- cat /config/configDropins/overrides/add_trace.xml
<server><logging traceSpecification="*=info:com.ibm.ws.webcontainer*=all" logDirectory="/serviceability/ebuy-j11-full-amdub22-olo/ebuy-olf-j11-0" maxFileSize="20" maxFiles="5"/></server>
rumanaHaque commented 6 months ago

I was able to collect the trace with fileMonitor enabled as requested.

It may be easiest to place it in your bootstrap.properties with:
com.ibm.ws.logging.trace.specification=*=info:fileMonitor=all

Here is the zip file which includes this trace. logs_26908.zip