OpenLiberty / open-liberty-operator

Eclipse Public License 2.0
28 stars 36 forks source link

Day 2 operation OpenLibertyDump not working due to storage permission denied issue #548

Open rumanaHaque opened 9 months ago

rumanaHaque commented 9 months ago

Describe the bug
When trying to do server dump on OCP 4.14 using OLO and instantO, I don't see the server dump created successfully. Ran into this issue while running TER - https://github.ibm.com/IBMCloudPak4Apps/WSHE-System-Test/issues/544

root@rhaqueSVT1:/opt/Acme_Automation/gitops# oc get openlibertydump -o wide
NAME                                            STARTED   REASON   MESSAGE   COMPLETED   REASON   MESSAGE   DUMP FILE
get-oldumps                                     True                         False       Error    Encountered error while running command: [/bin/sh -c mkdir -p /serviceability/ebuy-olo/ebuy-dbos-5b6577785f-695hd &&  server dump --archive=/serviceability/ebuy-olo/ebuy-dbos-5b6577785f-695hd/2023-12-12_20:35:22.zip --include=thread,heap,] ; Stderr: mkdir: cannot create directory ‘/serviceability’: Permission denied
 ; Error: command terminated with exit code 1  

Steps to Reproduce
Deploy image with instantOn, with ola having this spec,

spec:
    serviceability:
    size: 1Gi

Run this yml to get dump:

cat oldump.yml 
apiVersion: apps.openliberty.io/v1
kind: OpenLibertyDump
metadata:
  name: get-oldumps
spec:
  podName: ebuy-dbos-7597589b7f-4whp8
  include:
    - thread
    - heap

Apply this yml file in your env, and when you run this command, it fails with the error above.

root@rhaqueSVT1:/opt/Acme_Automation/gitops# oc apply -f oldump.yml 
openlibertydump.apps.openliberty.io/get-oldumps unchanged
root@rhaqueSVT1:/opt/Acme_Automation/gitops# oc get openlibertydump -o wide

Expected behavior
Server dump should be created

Diagnostic information:

product = Open Liberty 23.0.0.12 (wlp-1.0.84.cl231220231127-1901)
wlp.install.dir = /opt/ol/wlp/
server.output.dir = /opt/ol/wlp/output/defaultServer/
java.home = /opt/java/openjdk
java.version = 17.0.9
java.runtime = IBM Semeru Runtime Open Edition (17.0.9+9)
os = Linux (5.14.0-284.30.1.el9_2.x86_64; amd64) (en_US)

Additional context
Add any other context about the problem here.

tam512 commented 8 months ago

fyi, Testing with Liberty 24.0.0.1 image, I can get wldump okay on OCP 4.14

********************************************************************************
product = Open Liberty 24.0.0.1 (wlp-1.0.85.cl240120231230-1902)
wlp.install.dir = /opt/ol/wlp/
server.output.dir = /opt/ol/wlp/output/defaultServer/
java.home = /opt/java/openjdk
java.version = 17.0.9
java.runtime = IBM Semeru Runtime Open Edition (17.0.9+9)
os = Linux (5.14.0-284.30.1.el9_2.x86_64; amd64) (en_US)
process = 1025@8f987fde0a07
Classpath = /opt/ol/wlp/bin/tools/ws-server.jar
Java Library path = /opt/java/openjdk/lib/default:/opt/java/openjdk/lib:/usr/lib64:/usr/lib
********************************************************************************

% oc get wlapps
NAME                   IMAGE                                                                                                                                         EXPOSED   RECONCILED   RESOURCESREADY   READY   AGE
ebuy-olk-j17-amdrh90   docker-na-public.artifactory.swg-devops.com/hyc-wassvt-team-image-registry-docker-local/instanton/24.0.0.1/ebuy:ol-kernel-java17-amd-rhel90   true      True         True             True    18m

% oc get pods
NAME                     READY   STATUS    RESTARTS   AGE
ebuy-olk-j17-amdrh90-0   1/1     Running   0          20m

% cat wldump.yml 
apiVersion: liberty.websphere.ibm.com/v1
kind: WebSphereLibertyDump
metadata:
  name: get-wldump
spec:
  license:
    accept: true
  podName: ebuy-olk-j17-amdrh90-0
  include:
    - thread
    - heap

% oc apply -f wldump.yml 
webspherelibertydump.liberty.websphere.ibm.com/get-wldump created

% oc get webspherelibertydump -o wide 
NAME         STARTED   REASON   MESSAGE   COMPLETED   REASON   MESSAGE   DUMP FILE                                                    
get-wldump   True                         True                           /serviceability/ebuy-olk-amdrh90/ebuy-olk-j17-amdrh90-0/2024-01-10_18:49:14.zip                                                                                                                       

% oc exec ebuy-olk-j17-amdrh90-0 -- ls -l /serviceability/ebuy-olk-amdrh90/ebuy-olk-j17-amdrh90-0
total 46600
-rw-r-----. 1 default root 41140488 Jan 10 18:49 2024-01-10_18:49:14.zip
-rw-r-----. 1 default root     2187 Jan 10 18:47 messages.log
-rw-r-----. 1 default root  6574293 Jan 10 18:47 trace.log
leochr commented 7 months ago

@rumanaHaque In the issue description you have the following:

spec:
    serviceability:
    size: 1Gi

But the size field should be within serviceability - like this:

spec:
    serviceability:
        size: 1Gi

Please check and confirm whether PVC named get-oldumps-serviceability is present before running the dump operation. Thank you.

leochr commented 6 months ago

The issue is seen without InstantOn, so moving this issue to the Liberty Operator backlog.

The issue is intermittent. There weren't any changes to Day 2 operations in recent times, so this is not a regression.

leochr commented 6 months ago

Based on the discussion in the slack thread here, the symptoms seemed to indicate storage related issue (could be OpenShift issue or a combination of the two).

@rumanaHaque please try with a different storage provider and see if the issue still occurs