Open tam512 opened 3 weeks ago
This problem was discussed in slack https://ibm-cloud.slack.com/archives/C03MR7EC3NG/p1717621969960729?thread_ts=1717187652.885869&cid=C03MR7EC3NG
@tam512 do you see any logs persisted with a container image that doesn't use InstantOn in this setup? I see no way logs can be persisted if the mount of the /logs/
folder isn't writable by the user in the running container.
I can recreate outside of OCP if I mount a directory which is not writable by the user in the container.
When I attempt to start the same application image without the use of InstantOn then I get loads of output from verbose GC (because it cannot write to the verbosegc log and ends up logging to the standard out. I also don't see any messages or trace logs in the mounted logs directory.
For the OCP case to work the logs mount has to be writable by the user in the container. That does not appear to be the case here. I am unsure how you configure the deployment on OCP to have a writable mount here.
I can see the logs persisted without instantOn. I just deployed the app image to OCP and I can see older logs with older time stamps
% oc exec quicksec-jakarta-0 -- ls -l /
total 8
...........
drwxr-xr-x. 1 root root 4096 Jun 4 12:53 licenses
drwxrwsr-x. 3 root 1000790000 7 Jun 18 19:41 logs
.............
drwxr-xr-x. 1 root root 17 Jun 6 13:42 opt
lrwxrwxrwx. 1 root root 33 Jun 4 12:54 output -> /opt/ibm/wlp/output/defaultServer
..........
% oc exec quicksec-jakarta-0 -- ls -l /logs
total 3088
drwxrws---. 2 1000790000 1000790000 366 Jun 4 15:35 ffdc
-rw-rw----. 1 1000790000 1000790000 65060 Jun 5 18:53 messages_24.06.18_19.41.57.0.log
-rw-r-----. 1 1000790000 1000790000 53270 Jun 18 19:42 messages.log
-rw-rw----. 1 1000790000 1000790000 161940 May 29 21:45 trace_24.05.30_21.50.28.0.log
-rw-rw----. 1 1000790000 1000790000 278480 May 30 21:56 trace.log
-rw-rw----. 1 1000790000 1000790000 294580 Jun 18 19:42 verbosegc.log
-rw-rw----. 1 1000790000 1000790000 2306837 May 24 21:14 verbosegc.log.001
% oc exec quicksec-jakarta-0 -- ls -l /logs/ffdc
total 23197
-rw-rw----. 1 1000790000 1000790000 1205 Jun 4 15:35 exception_summary_24.06.04_15.32.29.0.log
-rw-rw----. 1 1000790000 1000790000 1205 Jun 4 15:38 exception_summary_24.06.04_15.35.49.0.log
-rw-rw----. 1 1000790000 1000790000 30554 May 23 16:44 ffdc_24.05.23_16.44.08.0.log
...........
-rw-rw----. 1 1000790000 1000790000 36550 May 24 02:22 ffdc_24.05.24_02.22.08.0.log
-rw-rw----. 1 1000790000 1000790000 32494 May 24 02:22 ffdc_24.05.24_02.22.09.0.log
........
-rw-rw----. 1 1000790000 1000790000 6907 May 25 11:08 ffdc_24.05.25_11.08.22.0.log
-rw-rw----. 1 1000790000 1000790000 129314 May 28 20:41 ffdc_24.05.28_20.41.27.0.log
.......
-rw-rw----. 1 1000790000 1000790000 129334 May 30 23:43 ffdc_24.05.30_23.43.39.0.log
-rw-rw----. 1 1000790000 1000790000 6906 May 31 00:07 ffdc_24.05.31_00.07.58.0.log
.........
-rw-rw----. 1 1000790000 1000790000 87754 Jun 4 15:15 ffdc_24.06.04_15.15.51.0.log
I just deployed the app image to OCP and I can see older logs with older time stamps
If you just deployed the app image right now to OCP I would expect new logs with today's timestamp.
sorry I was not clear about the logs. I saw today's log (Jun 18) and older logs
-rw-rw----. 1 1000790000 1000790000 65060 Jun 5 18:53 messages_24.06.18_19.41.57.0.log -rw-r-----. 1 1000790000 1000790000 53270 Jun 18 19:42 messages.log
I suspect the issue may be the change of the security context used when deploying the instanton image. Something has to be different there that causes the mounted /logs
folder to be read-only.
Below is the SecurityContextConstraints I used. Do I need to update anything to make it writeable to /logs
# https://docs.openshift.com/container-platform/4.14/authentication/managing-security-context-constraints.html
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: criu-scc is based on the restricted SCC but removes any restrictions
that prevent the restore image from running successfully.
generation: 1
name: cap-cr-scc
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: null
apiVersion: security.openshift.io/v1
defaultAddCapabilities:
- CHECKPOINT_RESTORE
- SETPCAP
fsGroup:
type: RunAsAny
groups:
- system:authenticated
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
type: RunAsAny
seLinuxContext:
type: RunAsAny
supplementalGroups:
type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
Maybe @leochr knows. I'm not sure if you can compare with what the effective one is for normal apps.
In OpenShift restricted-v2
is used as the SecurityContextConstraints (SCC) by default. Comparing the above SCC to it, I couldn't find any difference that could explain the storage permission issue. Suggest to try the above SCC cap-cr-scc
with a non-instanton app.
@leochr I also see the permission errors when using the above SCC with non-instanton app, but the app pod says it's running and the server actually started.
quicksec-jakarta-0 1/1 Running 0 6m42s
........
JVMJ9GC063E Unable to open file '/logs/verbosegc.log' for writing
.............
[6/26/24, 21:12:31:378 UTC] 0000002e org.infinispan.HOTROD I ISPN004021: Infinispan version: Infinispan 'Flying Saucer' 14.0.29.Final
TRAS0036E: The system could not create file /logs/messages.log because of the following exception: java.security.PrivilegedActionException: java.io.IOException: Permission denied
...............
[6/26/24, 21:12:35:595 UTC] 0000002b com.ibm.ws.kernel.feature.internal.FeatureManager I CWWKF0008I: Feature update completed in 7.904 seconds.
[6/26/24, 21:12:35:595 UTC] 0000002b com.ibm.ws.kernel.feature.internal.FeatureManager A CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 9.782 seconds.
When I try to copy the app logs, it also throws permission errors
% oc cp quicksec-jakarta-0:/logs quicksec-jakarta-0/
tar: Removing leading `/' from member names
tar: /logs/verbosegc.log.001: Cannot open: Permission denied
tar: /logs/messages.log: Cannot open: Permission denied
tar: /logs/trace.log: Cannot open: Permission denied
tar: /logs/verbosegc.log: Cannot open: Permission denied
tar: /logs/ffdc: Cannot open: Permission denied
tar: /logs/trace_24.05.30_21.50.28.0.log: Cannot open: Permission denied
tar: /logs/messages_24.06.25_20.41.29.0.log: Cannot open: Permission denied
tar: Exiting with failure status due to previous errors
I posted the full log in slack here https://ibm-cloud.slack.com/archives/CMM0D6JHF/p1719437551545339
when we deploy InstantOn app image to OCP using the following statefulSet settings
the server will fail to start with the following errors: