confluentinc / schema-registry-images

Docker Images for Schema Registry
Apache License 2.0
3 stars 21 forks source link

feat: Disable world write access on directories #48

Closed andrewegel closed 3 years ago

andrewegel commented 3 years ago

Disables world write access on directories.

erikgb commented 2 years ago

@andrewegel This change makes it unable to run the images on Openshift without elevated Openshift permissions. Openshift runs the container as a random user/uid. Making the /etc/schema-registry directory recursively writable to the root group should fix this. Is this something you can fix? We are currently running on version 6.2.1 (that works), but unable to upgrade to version 6.2.2.

===> User
uid=1001680000(1001680000) gid=0(root) groups=0(root),1001680000
===> Configuring ...
Command [/usr/local/bin/dub path /etc/schema-registry/ writable] FAILED !
andrewegel commented 2 years ago

@erikgb Whats the technical reason why you can't run this as the appuser [1]? Running things as root isn't recommended from a docker standpoint, nor is having a world write able directory (what this PR fixes).

I don't see a scenario where we can support both requirements:

If this were being ran by true root, root could write to it:

aegelhofer@Andrew-Egelhofer's- ~ % docker run -u0 -it confluentinc/cp-schema-registry:6.2.2 /usr/local/bin/dub path /etc/schema-registry/ writable
aegelhofer@Andrew-Egelhofer's- ~ % docker run -it confluentinc/cp-schema-registry:6.2.2 /usr/local/bin/dub path /etc/schema-registry/ writable

But your situation shows precisely what this PR was trying to fix: Which was unknown users should not have write access in the container.

To a certain extent, we called this out in the changelog:

https://docs.confluent.io/platform/6.2.2/release-notes/changelog.html#cp-docker-images

Confluent Platform Docker images formerly shipped with world writable data and configuration directories (chmod a+w). Those permissions are now removed, and adjustments made to make those directories writable from the image’s appuser USER.

These updates were tested with openshift before release and testing didn't run into these issues.

[1] https://github.com/confluentinc/schema-registry-images/blob/5.4.x/schema-registry/Dockerfile.ubi8#L80

erikgb commented 2 years ago

@andrewegel The reason we can't run as appuser is because we're not able to choose UID without elevated privileges (in Openshift). Currently every pod is running with the "restricted" SecurityContextConstraints (SCC), which only allows containers with a "random" high UID, selected by OpenShift. The only thing we know about the user inside the container is that it's member of the root group.

How do you test this in OpenShift? Is it running with the "restricted" SCC or an elevated permission, like anyuid?

Our current situation, is that we have to "patch" all Confluent images, since they do not work in our Openshift-context. And this is painful and something we would like to avoid. Our "patch" for the schema-registry image is:

RUN chgrp -R root /etc/schema-registry/ /etc/confluent/docker/

But we also had to patch rest-proxy and multiple connect images equivalently. 😓 And that represents a maintenance burden for us. Would you consider to change the ownership to root group on folders/files that have to be writable by the user inside the container? To make it more "OpenShift friendly"....

An OpenShift reference: https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids

erikgb commented 2 years ago

And even the simplest Deployment is failing on OpenShift:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: cp-sr
  labels:
    app: cp-sr
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cp-sr
  template:
    metadata:
      labels:
        app: cp-sr
    spec:
      containers:
        - name: app
          image: confluentinc/cp-schema-registry:6.2.2
          ports:
            - name: app
              containerPort: 8081
              protocol: TCP
          env:
            - name: SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS
              value: 'PLAINTEXT://kafka:9092'
            - name: SCHEMA_REGISTRY_LISTENERS
              value: 'http://0.0.0.0:8081'
            - name: SCHEMA_REGISTRY_HOST_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP

As for your testing on OpenShift, I would expect a Deployment like this to successfully run on OpenShift.

andrewegel commented 2 years ago

Firstly, lets open a separate issue and move the conversation to there.

Secondary, contact Confluent's support and have them escalate to the Operator team who have conducted these tests on an OpenShift cluster.

Your change of moving to appuser:root sounds OK to me, but I need to be cautious as that represents a large amount of work with no validation that it'll resolve your issue and not introduce problems for others, so I would like the guidance on what the fix should be to come from our internal team based on their recommendations from testing.

andrewegel commented 2 years ago

And I'm locking this PR's conversation so issues can be raised and tracked elsewhere.

andrewegel commented 2 years ago

Furthermore @erikgb - If you don't want to go through the Confluent Support route, join the confluentcommunity.slack.com slack Org, and find me @Ego Waffle or @Rohit Bakhshi (PM for Operator) and I'll put you in touch with our Operator team to come to a resolution.