docker-library / cassandra

Docker Official Image packaging for Cassandra
Apache License 2.0
262 stars 281 forks source link

Can't lock memory in Kubernetes #189

Closed allamand closed 4 years ago

allamand commented 4 years ago

To correctly run Cassandra in Kiubernetes, we need to be able to correctly lock the JVM memory.

With correct parameters we shoudl see line like this at Cassandra startup:

INFO  [main] 2019-09-27 09:50:52,165 NativeLibrary.java:174 - JNA mlockall successful

With the current image we got this error instead:

WARN  [main] 2019-09-27 06:58:27,999 NativeLibrary.java:187 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.

In Kubernetes, we need to have IPC_LOCK capability in order to do so, but this is not sufficient.

We should add this line in the Dockerfile in order to be able to correctly lock the memory

Exemple with this specific Dockerfile

FROM cassandra:3.11

RUN  apt-get update && apt-get -qq -y install \
       libcap2-bin \
     && setcap cap_ipc_lock=ep $(readlink -f $(which java))

With this Cassandra is now able to correctly lock memory.

Do you think I can do a PR to add this in the repo ?

yosifkit commented 4 years ago

Unfortunately not.

setcap will not work with some Docker storage drivers

- https://github.com/docker-library/httpd/issues/118#issuecomment-439207960

See also https://github.com/docker-library/logstash/pull/14#issuecomment-268670305

allamand commented 4 years ago

@yosifkit do you have other recommendation to achieve the goal ?

We didn't encountered any problem with this for more than 1 year, and this feature (to be able to lock memory as non root user) is mandatory for us.

yosifkit commented 4 years ago

Increase RLIMIT_MEMLOCK

$ docker run -it --rm --ulimit=memlock=-1 cassandra
...
INFO  [main] 2019-09-27 22:35:10,827 NativeLibrary.java:174 - JNA mlockall successful
cscetbon commented 4 years ago

@yosifkit, It's not sufficient when user is not root. See https://github.com/kubernetes/kubernetes/issues/3595#issuecomment-469928432

yosifkit commented 4 years ago

It works fine here. See also that the image does not run as root for long, see #48 and this part of the entrypoint.

$ docker run -it --rm --ulimit=memlock=-1 --user 999 cassandra
...
INFO  [main] 2019-09-30 19:47:04,083 NativeLibrary.java:174 - JNA mlockall successful
docker info: ```console $ docker info Client: Debug Mode: false Server: Containers: 3 Running: 3 Paused: 0 Stopped: 0 Images: 42 Server Version: 19.03.2 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: active NodeID: kmtwti5wl9lwmdgozgb8nod15 Is Manager: true ClusterID: sejbwq7y8u7abbfs4dhb0zr5a Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 192.168.1.7 Manager Addresses: 192.168.1.7:2377 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 4.9.0-6-amd64 Operating System: Debian GNU/Linux 9 (stretch) OSType: linux Architecture: x86_64 CPUs: 6 Total Memory: 23.55GiB Name: minas-morgul ID: FKMO:UBPJ:UT3U:BQ4O:ALZK:VBHC:KJ7A:O3YQ:2HHM:OU6Q:7WWD:UFXG Docker Root Dir: /mnt/infosiftr/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false ```
cscetbon commented 4 years ago

@yosifkit it works fine with docker not Kubernetes. To convince you, you can test a deployment with this config

You'll get

WARN  [main] 2019-10-02 03:14:55,781 NativeLibrary.java:187 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.

Then look at https://github.com/kubernetes/kubernetes/issues/3595#issuecomment-438507708

yosifkit commented 4 years ago

setcap and the related part of the yaml are just working around the problem and not the correct fix.

        securityContext:
          capabilities:
            add:
            - IPC_LOCK

You need to instead set the ulimit for memlock to unlimited or something big enough for java/cassandra to not complain. The fact that kubernetes has yet to support ulimit does not mean that we will change the image just to support their lack of configuration.

Users are free to make an image FROM this that adds the setcap (that might not work on some hosts) or something similar to https://github.com/kubernetes/kubernetes/issues/3595#issuecomment-288451522. Combine that with automated builds and it's reasonably easy to have an up-to-date image built FROM this one with the required modifications.