Closed hippyod closed 2 weeks ago
I saw a couple of years ago someone from the buildah/podman folks posted a bug about that might be related: #2968.
So I downloaded a CRC version from back in Feb (OCP 4.14) that I knew was good, and tested just to be sure. Doesn't work, and I've been mounting these same volumes and using them for years with CRC. Totally at a loss as to how to proceed at this point.
@hippyod Hi, did you try to create a test pod and mount a PV in there and still faced the the "unable to write" issue or is it just with jenkins?
Could you please share the steps to deploy jenkins on CRC, so that we can try to reproduce the issue, because it could also be that something have changed to the pvc
definition jenkins is using?
@anjannath @praveenkumar @cfergeau OK, after MANY HOURS of playing around today and yesterday, here's what I figured out (change it to a YAML file; GitHub wouldn't let me upload a *.yaml): test.txt
On CRC (of course), create a namespace called my-pvc-test
, and apply the yaml from the file. This will create two Deployments, two PVCs, one Service, and one ServiceAccount with one ClusterRoleBinding giving the ServiceAccount cluster role privileges.
Each PVC is mounted into the pod from the Deployment of read-write
or read-only
. In the read-only
Deployment, the ServiceAcount is assigned to the cluster-admin ServiceAccount. In read-write
, the default user. Both are using the bash image from DockerHub, and sleeping for a long time in the command entry point just to keep the pods up for testing.
Once the pods are running, rsh into each. In read-only
, touch /var/lib/jenkins/foo
will fail, and read-write
will not fail. The difference is that in read-only
, entering id
will give you the following output: uid=1001(1001) gid=0(root) groups=0(root)
, whereas in read-write
it's uid=1000680000(1000680000) gid=0(root) groups=0(root),1000680000
.
Please confirm you can replicate the problem.
I tried the yaml file which you shared on F40 with selinux enabled and I am not able to reproduce the issue which you are facing :(
10:48 $ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-data-1 Bound pvc-c0f367ac-c494-4d74-8042-542fe716ae8b 49Gi RWX crc-csi-hostpath-provisioner 14s
my-data-2 Bound pvc-28e8988e-611c-4ca1-8bb0-18b39aa57e01 49Gi RWX crc-csi-hostpath-provisioner 14s
10:48 $ oc get all
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
NAME READY STATUS RESTARTS AGE
pod/read-only-596cdf84c5-vldwj 1/1 Running 0 23s
pod/read-write-84f4b58fb9-lhlgs 1/1 Running 0 23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/read-only ClusterIP 10.217.4.100 <none> 8080/TCP 23s
service/read-write ClusterIP 10.217.5.111 <none> 8080/TCP 23s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/read-only 1/1 1 1 23s
deployment.apps/read-write 1/1 1 1 23s
NAME DESIRED CURRENT READY AGE
replicaset.apps/read-only-596cdf84c5 1 1 1 23s
replicaset.apps/read-write-84f4b58fb9 1 1 1 23s
10:48 $ oc rsh read-only-596cdf84c5-vldwj
~ $ id
uid=1000670000(1000670000) gid=0(root) groups=0(root),1000670000
~ $ ls /var/lib/jenkins/
~ $ touch /var/lib/jenkins/test
~ $ vi /var/lib/jenkins/test
$ cat /var/lib/jenkins/test
afajd
afkadjfk
adfjakf
~ $ id
uid=1000670000(1000670000) gid=0(root) groups=0(root),1000670000
~ $ exit
10:50 $ oc rsh read-write-84f4b58fb9-lhlgs
~ $ touch /var/lib/jenkins/test
~ $ vi /var/lib/jenkins/test
~ $ cat /var/lib/jenkins/test
adfja
akdfja
adfkja
~ $ id
uid=1000670000(1000670000) gid=0(root) groups=0(root),1000670000
~ $ exit
10:51 $ ./crc status
CRC VM: Running
OpenShift: Running (v4.15.17)
RAM Usage: 7.524GB of 10.92GB
Disk Usage: 37.11GB of 53.08GB (Inside the CRC VM)
Cache Usage: 174.5GB
Cache Directory: /home/prkumar/.crc/cache
10:51 $ getenforce
Enforcing
Are you using CSB provided F40?
@praveenkumar @anjannath @cfergeau Deep apologies for taking awhile to get back with all y'all. Ran a huge amount of tests, up to and including rebuilding my machine from scratch to make sure it wasn't an OS problem. I wanted to be thorough.
Long and short of it is that what I thought were some minor change to my custom SecurityContextConstraints
to support nonroot podman builds weren't trivial at all. Somehow they corrupted everything, and I have no idea why (I do NOT understand them as well as I'd hoped). The RH documentation on the subject on
Everything is working as it did before. I do not understand (I want to emphasize this quite a bit) why the example I sent you failed on my machine even though there was no relation to the SCC in my example. I don't understand why my custom SCC changes made such a mess of things, but I'm only an application developer.
For posterity's sake, the fixes I stumbled on were:
allowPrivilegeEscalation: true
(for podman to work correctly in rootless mode)
fsGroup: type: MustRunAs
(for cluster-admins to properly set the volumes)
Alternative work-arounds I avoided as more privileged:
allowHostDirVolumePlugin: true
fsGroup: type: RunAsAny
(in this case, I had to set fsGroup: 0
on the Deployment)
Sorry for confusion. I wish I understood OpenShift security better, but hopefully in the future if someone makes a stupid mistake like this again they'll find this info. Thanks again for the help.
General information
crc setup
before starting it (Yes/No)?CRC version
CRC status
CRC config
Host Operating System
Steps to reproduce
Expected
It should be read/write by default (always was this way)
Actual
Read only now. I also tested with the latest version, too, and I went back one release to test this, just in case.
Logs
I get the following from the pod when the Jenkins utility tries to download plugins using it's utility. It fails now when I try using a persistent volume. This has worked for me for a couple of years now, until the last couple of releases or so?
Before gather the logs try following if that fix your issue
I had to go back to the normal container filesystem to get it working. Would like to have my persistent volumes back. Did something change in the VM? Or in OpenShift 4.15 I wasn't aware of? Or in the way dynamic persistent volumes are configured in CRC now?
I also started to notice this error, and I don't know if this is related, or I should open another bug? From a recent Jenkins build that caused a failure in my Jenkins agent.
That from my Jenkins logs trying to build an image in a pod. I am using the same method I have been using for two years creating a custom SCC to enable rootless builds in cluster. Again, did something change in how the image is created? Or did something change in OpenShift that breaks this?