ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

Nomad Cannot Mount Ceph CSI Volume onto Prometheus Container #3682

Closed acziryak closed 1 year ago

acziryak commented 1 year ago

Describe the bug

csi_hook failed on container

Environment details

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup details: Nomad cluster is trying to run a Prometheus container v2.41.0 and mount a Ceph CSI volume that has been created
  2. Deployment to trigger the issue: Deploy a prometheus container and see that it starts up correctly. Add the volume stanza to the group and the task config, and see that it fails.
  3. See error

Actual results

The Job does not even register as having started.

Expected behavior

The container should mount the volume and start up.

Logs

Nomad client logs:

Feb 20 13:14:31 ind-test-nomad-worker13 nomad[1791773]:     2023-02-20T13:14:31.479-0500 [ERROR] client.alloc_runner: prerun failed: alloc_id=2ca1aa43-74ab-93ed-eb47-180b025b223c
Feb 20 13:14:31 ind-test-nomad-worker13 nomad[1791773]:   error=
Feb 20 13:14:31 ind-test-nomad-worker13 nomad[1791773]:   | pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = mount failed: exit status 1
Feb 20 13:14:31 ind-test-nomad-worker13 nomad[1791773]:   | Mounting command: mount
Feb 20 13:14:31 ind-test-nomad-worker13 nomad[1791773]:   | Mounting arguments: -t ext4 -o _netdev,defaults /dev/rbd0 /local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer/0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958
Feb 20 13:14:31 ind-test-nomad-worker13 nomad[1791773]:   | Output: mount: only root can use "--options" option (effective UID is 100000)

CSI Node logs:

I0220 18:14:21.655751       7 utils.go:212] ID: 25031 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0220 18:14:31.139151       7 utils.go:195] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC call: /csi.v1.Node/NodeStageVolume
I0220 18:14:31.139270       7 utils.go:206] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"1e35f6bc-1257-45b6-aa9d-16f9ecd30652","imageFeatures":"layering","imageName":"csi-vol-35dd4807-b136-11ed-b190-3a54371b2958","journalPool":"ind-nonprod2","pool":"ind-nonprod2"},"volume_id":"0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958"}
I0220 18:14:31.139513       7 rbd_util.go:1279] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 setting disableInUseChecks: false image features: [layering] mounter: rbd
I0220 18:14:31.142240       7 omap.go:88] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 got omap values: (pool="ind-nonprod2", namespace="", name="csi.volume.35dd4807-b136-11ed-b190-3a54371b2958"): map[csi.imageid:65e309b2e3eef2 csi.imagename:csi-vol-35dd4807-b136-11ed-b190-3a54371b2958 csi.volname:prometheus-us-ind-test]
I0220 18:14:31.160384       7 rbd_util.go:346] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 checking for ImageFeatures: [layering]
I0220 18:14:31.197037       7 cephcmds.go:105] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 command succeeded: rbd [device list --format=json --device-type krbd]
I0220 18:14:31.213997       7 rbd_attach.go:420] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 rbd: map mon 10.2.64.128,10.2.64.129,10.2.64.130,10.2.64.131,10.2.64.132
I0220 18:14:31.323490       7 cephcmds.go:105] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 command succeeded: rbd [--id ind-nonprod2 -m 10.2.64.128,10.2.64.129,10.2.64.130,10.2.64.131,10.2.64.132 --keyfile=***stripped*** map ind-nonprod2/csi-vol-35dd4807-b136-11ed-b190-3a54371b2958 --device-type krbd --options noudev]
I0220 18:14:31.323538       7 nodeserver.go:414] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 rbd image: ind-nonprod2/csi-vol-35dd4807-b136-11ed-b190-3a54371b2958 was successfully mapped at /dev/rbd0
I0220 18:14:31.323621       7 mount_linux.go:563] Attempting to determine if disk "/dev/rbd0" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/rbd0])
I0220 18:14:31.349670       7 mount_linux.go:566] Output: "DEVNAME=/dev/rbd0\nTYPE=ext4\n"
I0220 18:14:31.349769       7 mount_linux.go:563] Attempting to determine if disk "/dev/rbd0" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/rbd0])
I0220 18:14:31.372035       7 mount_linux.go:566] Output: "DEVNAME=/dev/rbd0\nTYPE=ext4\n"
I0220 18:14:31.372132       7 mount_linux.go:452] Checking for issues with fsck on disk: /dev/rbd0
I0220 18:14:31.411500       7 mount_linux.go:553] Attempting to mount disk /dev/rbd0 in ext4 format at /local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer/0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958
I0220 18:14:31.411617       7 mount_linux.go:219] Mounting cmd (mount) with arguments (-t ext4 -o _netdev,defaults /dev/rbd0 /local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer/0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958)
E0220 18:14:31.413792       7 mount_linux.go:231] Mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev,defaults /dev/rbd0 /local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer/0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958
Output: mount: only root can use "--options" option (effective UID is 100000)

E0220 18:14:31.413842       7 nodeserver.go:780] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 failed to mount device path (/dev/rbd0) to staging path (/local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer/0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958) for volume (0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958) error: mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev,defaults /dev/rbd0 /local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer/0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958
Output: mount: only root can use "--options" option (effective UID is 100000)
 Check dmesg logs if required.
I0220 18:14:31.478129       7 cephcmds.go:105] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 command succeeded: rbd [unmap /dev/rbd0 --device-type krbd --options noudev]
E0220 18:14:31.478296       7 utils.go:210] ID: 25032 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC error: rpc error: code = Internal desc = mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev,defaults /dev/rbd0 /local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer/0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958
Output: mount: only root can use "--options" option (effective UID is 100000)
I0220 18:14:31.788252       7 utils.go:195] ID: 25033 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0220 18:14:31.788338       7 utils.go:206] ID: 25033 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC request: {"target_path":"/local/csi/per-alloc/2ca1aa43-74ab-93ed-eb47-180b025b223c/prometheus-us-ind-test/rw-file-system-single-node-writer","volume_id":"0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958"}
I0220 18:14:31.788383       7 nodeserver.go:865] ID: 25033 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 targetPath: /local/csi/per-alloc/2ca1aa43-74ab-93ed-eb47-180b025b223c/prometheus-us-ind-test/rw-file-system-single-node-writer has already been deleted
I0220 18:14:31.788407       7 utils.go:212] ID: 25033 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC response: {}
I0220 18:14:31.788930       7 utils.go:195] ID: 25034 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC call: /csi.v1.Node/NodeUnstageVolume
I0220 18:14:31.788978       7 utils.go:206] ID: 25034 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC request: {"staging_target_path":"/local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer","volume_id":"0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958"}
I0220 18:14:31.789076       7 nodeserver.go:961] ID: 25034 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 failed to find image metadata: missing stash: open /local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer/image-meta.json: no such file or directory
I0220 18:14:31.789105       7 utils.go:212] ID: 25034 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-35dd4807-b136-11ed-b190-3a54371b2958 GRPC response: {}

Additional context

Nomad Volume:

id = "{{ alm_id }}"
name = "{{ alm_id }}"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "20G"
capacity_min = "10G"
namespace = "test"

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

secrets {
  userID  = "{{ alm_datacenter }}"
  userKey = "{{ lookup('community.hashi_vault.vault_kv2_get', alm_env + '/ceph_csi/shared_secret/prometheus', engine_mount_point='secret') | json_query('data.data.user_key') }}"
}

parameters {
  clusterID = "{{ ceph['cluster_id'][alm_loc] }}"
  pool = "{{ alm_datacenter }}"
  imageFeatures = "layering"
}

Nomad Job:

job "{{ alm_id }}" {
    datacenters = ["{{ alm_datacenter }}"]
    namespace = "{{ alm_env }}"
    group "{{ alm_id }}" {
        network {
            mode = "bridge"
            port "{{ alm_id }}" {
                to = 9090
            }
        }

        # volume "certs-{{ alm_id }}" {
        #     type = "host"
        #     read_only = true
        #     source = "ca-certificates"
        # }

        volume "{{ alm_id }}" {
            type      = "csi"
            attachment_mode = "file-system"
            access_mode     = "single-node-writer"
            read_only = false
            source    = "{{ alm_id }}"
        }

        service {
            name = "{{ alm_id }}"
            port = "{{ alm_id }}"
        }

        task "web-ui" {
            # template {
            #     change_mode = "noop"
            #     destination = "local/prometheus.yml"

            #     data = file("prometheus.yml")

            # }

            # volume_mount {
            #     volume      = "certs"
            #     destination = "/etc/ssl/certs"
            # }

            volume_mount {
                volume      = "{{ alm_id }}"
                destination = "/prometheus"
                read_only   = false
            }

            driver = "docker"
            config {
                image = "prom/prometheus:{{ prometheus[alm_env]['config_version'] }}"
                userns_mode = "host"
                # mount {
                #     type = "bind"
                #     source = "local/prometheus.yml"
                #     target = "/etc/prometheus/prometheus.yml"
                # }
            }
        }
    }
}

May be relevant: https://docs.docker.com/engine/security/userns-remap/#user-namespace-known-limitations

Instructions followed: https://docs.ceph.com/en/latest/rbd/rbd-nomad/

Madhu-1 commented 1 year ago

Not sure how cephcsi is deployed, you need to use root user and privileged access to run cephcsi

acziryak commented 1 year ago

@Madhu-1 per the Instructions followed in https://docs.ceph.com/en/latest/rbd/rbd-nomad/

I am running quay.io/cephcsi/cephcsi.

This results in the following processes:

root     1541944 1541923  0 Feb17 ?        00:00:12 /sbin/docker-init -- /usr/local/bin/cephcsi --type=rbd --drivername=rbd.csi.ceph.com --nodeserver=true --endpoint=unix://csi/csi.sock --nodeid=ind-test-nomad-worker11 --instanceid=ind-test-nomad-worker11-nodes --pidlimit=-1 --logtostderr=true --v=5 --metricsport=27522
root     1541956 1541944  0 Feb17 ?        00:00:59 /usr/local/bin/cephcsi --type=rbd --drivername=rbd.csi.ceph.com --nodeserver=true --endpoint=unix://csi/csi.sock --nodeid=ind-test-nomad-worker11 --instanceid=ind-test-nomad-worker11-nodes --pidlimit=-1 --logtostderr=true --v=5 --metricsport=27522

EDIT: This is with user namespacing enabled. Is this plugin incompatible with that?

Docker logs seem to indicate that this was created with the UsernsMode as host:

Feb 21 11:50:06 ind-test-nomad-worker12 dockerd[1888307]: time="2023-02-21T11:50:06.389976821-05:00" level=debug msg="form data: {\"Cmd\":[\"--type=rbd\",\"--drivername=rbd.csi.ceph.com\",\"--nodeserver=true\",\"--endpoint=unix://csi/csi.sock\",\"--nodeid=ind-test-nomad-worker12\",\"--instanceid=ind-test-nomad-worker12-nodes\",\"--pidlimit=-1\",\"--logtostderr=true\",\"--v=5\",\"--metricsport=22039\"],\"Entrypoint\":null,\"Env\":[\"CSI_ENDPOINT=unix:///csi/csi.sock\",\"NOMAD_ADDR_metrics=10.2.42.209:22039\",\"NOMAD_ALLOC_DIR=/alloc\",\"NO
MAD_ALLOC_ID=e4fb0742-5d96-c08a-b10e-13195a0a9c3a\",\"NOMAD_ALLOC_INDEX=0\",\"NOMAD_ALLOC_NAME=ceph-csi-nodes-cephcsi-us-ind-test.ceph-csi-nodes-cephcsi-us-ind-test[0]\",\"NOMAD_ALLOC_PORT_metrics=22039\",\"NOMAD_CPU_LIMIT=500\",\"NOMAD_DC=ind-nonprod2\",\"NOMAD_GROUP_NAM
E=ceph-csi-nodes-cephcsi-us-ind-test\",\"NOMAD_HOST_ADDR_metrics=10.2.42.209:22039\",\"NOMAD_HOST_IP_metrics=10.2.42.209\",\"NOMAD_HOST_PORT_metrics=22039\",\"NOMAD_IP_metrics=10.2.42.209\",\"NOMAD_JOB_ID=ceph-csi-nodes-cephcsi-us-ind-test\",\"NOMAD_JOB_NAME=ceph-csi-nodes-cephcsi-us-ind-test\",\"NOMAD_MEMORY_LIMIT=256\",\"NOMAD_NAMESPACE=test\",\"NOMAD_PARENT_CGROUP=nomad.slice\",\"NOMAD_PORT_metrics=22039\",\"NOMAD_REGION=us\",\"NOMAD_SECRETS_DIR=/secrets\",\"NOMAD_SHORT_ALLOC_ID=e4fb0742\",\"NOMAD_TASK_DIR=/local\",\"NOMAD_TASK_NAME=ce
ph-csi-nodes-cephcsi-us-ind-test\"],\"HostConfig\":{\"Binds\":[\"/opt/nomad/data/alloc/e4fb0742-5d96-c08a-b10e-13195a0a9c3a/alloc:/alloc\",\"/opt/nomad/data/alloc/e4fb0742-5d96-c08a-b10e-13195a0a9c3a/ceph-csi-nodes-cephcsi-us-ind-test/local:/local\",\"/opt/nomad/data/allo
c/e4fb0742-5d96-c08a-b10e-13195a0a9c3a/ceph-csi-nodes-cephcsi-us-ind-test/secrets:/secrets\",\"/opt/nomad/data/alloc/e4fb0742-5d96-c08a-b10e-13195a0a9c3a/ceph-csi-nodes-cephcsi-us-ind-test/local/config.json:/etc/ceph-csi-config/config.json\"],\"CapDrop\":[\"net_raw\"],\"C
groupParent\":\"nomad.slice\",\"ConsoleSize\":[0,0],\"CpuShares\":500,\"LogConfig\":{\"Config\":{\"max-file\":\"2\",\"max-size\":\"2m\"},\"Type\":\"json-file\"},\"Memory\":268435456,\"MemorySwap\":268435456,\"MemorySwappiness\":0,\"Mounts\":[{\"Target\":\"/tmp/csi/keys\",
\"TmpfsOptions\":{\"SizeBytes\":1000000},\"Type\":\"tmpfs\"},{\"BindOptions\":{},\"Source\":\"/sys\",\"Target\":\"/sys\",\"Type\":\"bind\"},{\"BindOptions\":{\"Propagation\":\"rshared\"},\"Source\":\"/opt/nomad/data/client/csi/plugins/e4fb0742-5d96-c08a-b10e-13195a0a9c3a\",\"Target\":\"/csi\",\"Type\":\"bind\"},{\"BindOptions\":{\"Propagation\":\"rshared\"},\"Source\":\"/opt/nomad/data/client/csi/node/ceph-csi\",\"Target\":\"/local/csi\",\"Type\":\"bind\"},{\"BindOptions\":{\"Propagation\":\"rprivate\"},\"Source\":\"/dev\",\"Target\":\"/d
ev\",\"Type\":\"bind\"}],\"NetworkMode\":\"host\",\"PidsLimit\":0,\"Privileged\":true,\"RestartPolicy\":{},\"UsernsMode\":\"host\"},\"Image\":\"quay.io/cephcsi/cephcsi:v3.7.2\",\"Labels\":{\"com.hashicorp.nomad.alloc_id\":\"e4fb0742-5d96-c08a-b10e-13195a0a9c3a\"},\"User\"
:\"root\"}"
acziryak commented 1 year ago

Per https://github.com/moby/moby/issues/28986, it does not seem like it is possible to run ceph-csi where the docker daemon has user namespacing (userns-remap) enabled.

Madhu-1 commented 1 year ago

might be the issue not an expert on this, as its env issue nothing can be done in cephcsi for it.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.