Entrypoint tries to chown data subdirectories with its non-root user, but that's not useful.

juliohm1978 commented 5 years ago

Graylog fails to start if mounted volumes into ${GRAYLOG_HOME}/data are not owned by the same user inside the container (uid:1100 gid:1100).

This refers to the docker-entrypoint.sh script (line 51).

The entrypoint will list entries in ${GRAYLOG_HOME}/data and try to chown them to the graylog:graylog user. This only works if the directories are already owned by that user.

chown: changing ownership of '/usr/share/graylog/data/journal': Operation not permitted
Current master is 
Launching graylog-0 as master
pod/graylog-0 labeled
Starting graylog
Graylog Home /usr/share/graylog
Graylog User graylog
JVM Options -Djava.net.preferIPv4Stack=true -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -Xms1024g -Xmx1024g
2019-04-27 04:05:49,687 INFO    [CmdLineTool] - Loaded plugin: AWS plugins 3.0.1 [org.graylog.aws.AWSPlugin] - {}
2019-04-27 04:05:49,693 INFO    [CmdLineTool] - Loaded plugin: Collector 3.0.1 [org.graylog.plugins.collector.CollectorPlugin] - {}
2019-04-27 04:05:49,694 INFO    [CmdLineTool] - Loaded plugin: Threat Intelligence Plugin 3.0.1 [org.graylog.plugins.threatintel.ThreatIntelPlugin] - {}
2019-04-27 04:05:49,788 ERROR   [CmdLineTool] - Invalid configuration - {}
com.github.joschi.jadconfig.ValidationException: Parent directory /usr/share/graylog/data/journal for Node ID file at /usr/share/graylog/data/journal/node-id is not writable
    at org.graylog2.Configuration$NodeIdFileValidator.validate(Configuration.java:302) ~[graylog.jar:?]
    at org.graylog2.Configuration$NodeIdFileValidator.validate(Configuration.java:284) ~[graylog.jar:?]
    at com.github.joschi.jadconfig.JadConfig.validateParameter(JadConfig.java:215) ~[graylog.jar:?]
    at com.github.joschi.jadconfig.JadConfig.processClassFields(JadConfig.java:148) ~[graylog.jar:?]
    at com.github.joschi.jadconfig.JadConfig.process(JadConfig.java:99) ~[graylog.jar:?]
    at org.graylog2.bootstrap.CmdLineTool.processConfiguration(CmdLineTool.java:351) [graylog.jar:?]
    at org.graylog2.bootstrap.CmdLineTool.readConfiguration(CmdLineTool.java:344) [graylog.jar:?]
    at org.graylog2.bootstrap.CmdLineTool.run(CmdLineTool.java:178) [graylog.jar:?]
    at org.graylog2.bootstrap.Main.main(Main.java:50) [graylog.jar:?]

On that note, the graylog container might as well not even try to chown directories if it's running as non-root.

A common way to workaround this is to adjust volume permissions from the host where the volume is located and restart the container. That is simple enough if you are running docker-compose, NFS volumes, or just testing on your local machine.

However, in some cases, volume contents are not accessible from outside the container. As an example, volumes provisioned automatically by OpenEBS in a Kubernetes cluster hide their data in block files replicated throughout the cluster. Changing these permissions is not just a matter of chowning a directory in the host OS, and further hacks need to be improvised (such as this one, where I'm trying to workaround by adjusting the helm chart for a Kubernetes deployment).

I'm still trying to think of ways to improve this. I'm not sure what the best approach would be.

Maybe, run the Graylog container as root and, at the end of the entrypoint, launch the graylog process with another user?

jalogisch commented 5 years ago

he @juliohm1978

I personal see the need for this and the pain you might have. But Graylog needs to be able to write to some specific directories. I could think of having a function like the following to check that for specific directorys and give a meaningful error message back.

function check_target {

    check_dir=$(dirname "${1}")

    if [ ! -d "${check_dir}" ]; then
        echo "Error: Target directory ${check_dir} not available." >&2
        exit 1
    fi
    if [ ! -x "${check_dir}" ]; then
        echo "Error: Target directory ${check_dir} not accessable." >&2
        exit 1
    fi
    if [ ! -w "${check_dir}" ]; then
        echo "Error: Target directory ${check_dir} not writable." >&2
        exit 1
    fi
}

Not sure if that will help if the process in the container - that runs for a reason as user and not root - can't write to the mounted volumen. I'm not that deep into kubernetes or openEBS but how does other handle this situation? How did you work with Elasticsearch?

juliohm1978 commented 5 years ago

The error messages would be an improvement, but they don't really resolve the issue. The real problem, for the use case I mention, is that some container volume solutions are not as easily accessible so that you can just cd ... and chown ... a directory in the host where the container is running.

The elasticsearch image uses a different approach.

It creates its user during docker build, but leaves the entrypoint running as root. During the entrypoint, it sets up its runtime as root -- that includes chowning data/volume directories -- and, at the end, drop to uid 1000 to run the elasticsearch process.

It uses chroot to drop privileges, which is a binary already available in this graylog image. It is possible to implement a similar behavior here.

bodsch commented 5 years ago

I think that's the wrong way to go. The graylog container holds the sovereignty over the data as a defined user. If you use the usual way of a docker volume, everything will work. If you map the data directory into the host system, you have to take care of the correct rights yourself.

In the cubernet environment this is exactly what is required, that a container must not run with root rights.

I will try to extend the tests to a Kubernet cluster and revise the compose accordingly.

juliohm1978 commented 5 years ago

I fully understand the security concerns for not using root inside the container, but I'm still trying to reconcile this with the implications on how volume provisioning works today.

Fundamentally, the container with its limited user permissions expects its volumes to be fully provisioned. The provisioner -- either manual or dynamic -- should embrace the responsibility of providing proper storage and the permissions so the container is able to read/write as expected.

For manual volume provisioning, such as local host directories, that means an administrator issuing chown/chmod ... /path/to/vol/on/host. NFS volumes, for example, need to be mounted temporarily for that.

For dynamic provisioning, the solution adopted should be able to change these permissions dynamically before deliverying a volume.

Either way, information about the permission that needs to be given needs to be available to the provisioner. In practice, it needs to know which uid:gid and which file modes are expected to make these adjustments. That's metadata about the volume that needs to be available somehow.

Kubernetes allows a pod to run Init Containers before the application container starts, which can bootstrap the environment. The Graylog chart uses busybox to clean up the volume before hand. Several charts also run chown and chmod at that moment. This discussion itself started as a PR to the Graylog chart requesting exactly that.

The concern raised in the chart PR seems like a valid one to me at this point. If the init container needs to chown directories for the application to work correctly, then it needs to be aware of the uid:gid its container is using internally. That is only readily available during docker buid or the runtime of the application. Fixing this in the chart means hard coding the uid:gid used in this Dockerfile. It sounds like a hack, because that's what it is. The chart becomes dependent on metadata only available inside the container, and compatibility breaks as soon as the Dockerfile from this project goes through an overhaul.

So, right now, I'm not sure which is a better solution. Both PRs can be considered.

With the options available now, I'm slightly biased towards your opinion. Should we sacrifice some container security or just add a line of code to the volume provisioning solution? I'm not sure.

juliohm1978 commented 5 years ago

Recent PR from Graylog chart community works around the issue by hard coding the uid:gid in their init container.

https://github.com/helm/charts/pull/12983

If this sparks any interest in the future, feel free to reopen or request further discussion.

Thank you for the support!

rcdailey commented 4 years ago

I'm getting these errors today:

chown: changing ownership of '/usr/share/graylog/data/journal': Operation not permitted
Warning can not change owner to graylog:graylog
chown: changing ownership of '/usr/share/graylog/data/log': Operation not permitted
Warning can not change owner to graylog:graylog
chown: changing ownership of '/usr/share/graylog/data/plugin': Operation not permitted
Warning can not change owner to graylog:graylog
chown: changing ownership of '/usr/share/graylog/data/config': Operation not permitted
Warning can not change owner to graylog:graylog
chown: changing ownership of '/usr/share/graylog/data/contentpacks': Operation not permitted
Warning can not change owner to graylog:graylog
ERROR StatusLogger File not found in file system or classpath: /usr/share/graylog/data/config/log4j2.xml
ERROR StatusLogger Reconfiguration failed: No configuration found for '70dea4e' at 'null' in 'null'
01:34:42.806 [main] ERROR org.graylog2.bootstrap.CmdLineTool - Couldn't load configuration: Properties file /usr/share/graylog/data/config/graylog.conf doesn't exist!

The respective configuration (I'm using Docker Compose v3 file format):

  app:
    image: graylog/graylog:3.1
    user: $UID:$GID
    networks:
      local:
      # reverse_proxy:
      #   aliases:
      #   - graylog
    volumes:
      - ./graylog:/usr/share/graylog/data
    depends_on:
      - mongo
      - elastic
    environment:
      - GRAYLOG_HTTP_EXTERNAL_URI=http://192.168.1.50:9000/
    ports:
      - 9000:9000 # Web interface
      - 1514:1514 # Syslog
      - 12201:12201 # GELF

UID and GID are set from my .bashrc as follows (I'm running Ubuntu Server):

export UID
export GID="$(id -g)"

This is my way of setting my logged in user as the user for my containers. In this case, the container should have permission to write to the bind mount that I provide. But it's trying to chown those directories for some reason. Has this been resolved with this issue?

juliohm1978 commented 4 years ago

Hi, @rcdailey.

chown messages are just warnings. The actual error message is probably more relevant.

ERROR StatusLogger File not found in file system or classpath: /usr/share/graylog/data/config/log4j2.xml
ERROR StatusLogger Reconfiguration failed: No configuration found for '70dea4e' at 'null' in 'null'

If the volume directory does not exist before the container runs, Docker creates it as root by default. Check the owner and permissions on your local ./graylog directory. You might have to chown that to your user manually. That being the case, you can avoid the error by somehow creating the directory before running the container.

Graylog2 / graylog-docker

Entrypoint tries to chown data subdirectories with its non-root user, but that's not useful. #76