Closed andrewegel closed 3 years ago
@andrewegel This change makes it unable to run the images on Openshift without elevated Openshift permissions. Openshift runs the container as a random user/uid. Making the /etc/schema-registry
directory recursively writable to the root
group should fix this. Is this something you can fix? We are currently running on version 6.2.1 (that works), but unable to upgrade to version 6.2.2.
===> User
uid=1001680000(1001680000) gid=0(root) groups=0(root),1001680000
===> Configuring ...
Command [/usr/local/bin/dub path /etc/schema-registry/ writable] FAILED !
@erikgb Whats the technical reason why you can't run this as the appuser
[1]? Running things as root isn't recommended from a docker standpoint, nor is having a world write able directory (what this PR fixes).
I don't see a scenario where we can support both requirements:
If this were being ran by true root, root could write to it:
aegelhofer@Andrew-Egelhofer's- ~ % docker run -u0 -it confluentinc/cp-schema-registry:6.2.2 /usr/local/bin/dub path /etc/schema-registry/ writable
aegelhofer@Andrew-Egelhofer's- ~ % docker run -it confluentinc/cp-schema-registry:6.2.2 /usr/local/bin/dub path /etc/schema-registry/ writable
But your situation shows precisely what this PR was trying to fix: Which was unknown users should not have write access in the container.
To a certain extent, we called this out in the changelog:
https://docs.confluent.io/platform/6.2.2/release-notes/changelog.html#cp-docker-images
Confluent Platform Docker images formerly shipped with world writable data and configuration directories (chmod a+w). Those permissions are now removed, and adjustments made to make those directories writable from the image’s appuser USER.
These updates were tested with openshift before release and testing didn't run into these issues.
@andrewegel The reason we can't run as appuser
is because we're not able to choose UID without elevated privileges (in Openshift). Currently every pod is running with the "restricted" SecurityContextConstraints (SCC), which only allows
containers with a "random" high UID, selected by OpenShift. The only thing we know about the user inside the container is that it's member of the root
group.
How do you test this in OpenShift? Is it running with the "restricted" SCC or an elevated permission, like anyuid
?
Our current situation, is that we have to "patch" all Confluent images, since they do not work in our Openshift-context. And this is painful and something we would like to avoid. Our "patch" for the schema-registry image is:
RUN chgrp -R root /etc/schema-registry/ /etc/confluent/docker/
But we also had to patch rest-proxy and multiple connect images equivalently. 😓 And that represents a maintenance burden for us. Would you consider to change the ownership to root
group on folders/files that have to be writable by the user inside the container? To make it more "OpenShift friendly"....
An OpenShift reference: https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids
And even the simplest Deployment is failing on OpenShift:
kind: Deployment
apiVersion: apps/v1
metadata:
name: cp-sr
labels:
app: cp-sr
spec:
replicas: 1
selector:
matchLabels:
app: cp-sr
template:
metadata:
labels:
app: cp-sr
spec:
containers:
- name: app
image: confluentinc/cp-schema-registry:6.2.2
ports:
- name: app
containerPort: 8081
protocol: TCP
env:
- name: SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS
value: 'PLAINTEXT://kafka:9092'
- name: SCHEMA_REGISTRY_LISTENERS
value: 'http://0.0.0.0:8081'
- name: SCHEMA_REGISTRY_HOST_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
As for your testing on OpenShift, I would expect a Deployment like this to successfully run on OpenShift.
Firstly, lets open a separate issue and move the conversation to there.
Secondary, contact Confluent's support and have them escalate to the Operator team who have conducted these tests on an OpenShift cluster.
Your change of moving to appuser:root
sounds OK to me, but I need to be cautious as that represents a large amount of work with no validation that it'll resolve your issue and not introduce problems for others, so I would like the guidance on what the fix should be to come from our internal team based on their recommendations from testing.
And I'm locking this PR's conversation so issues can be raised and tracked elsewhere.
Furthermore @erikgb - If you don't want to go through the Confluent Support route, join the confluentcommunity.slack.com
slack Org, and find me @Ego Waffle
or @Rohit Bakhshi
(PM for Operator) and I'll put you in touch with our Operator team to come to a resolution.
Disables world write access on directories.