ceph / ceph-ansible

Ansible playbooks to deploy Ceph, the distributed filesystem.
Apache License 2.0
1.68k stars 1.01k forks source link

Stuck: create dashboard admin user #5984

Closed jeevadotnet closed 3 years ago

jeevadotnet commented 3 years ago

Bug Report What happened: Building ceph with ceph-ansible and it gets stuck on the task: "[ceph-dashboard : create dashboard admin user] "

What you expected to happen: For ceph-ansible 5.0 to run through and complete the build

A weird thing I pick up is with every run, is that there is a 5th container trying to start on one of my controller nodes: (apart from mgr, mds, mon & node-exporter). Even though this specific node is not even set as the grafana/monitoring node.

A-08-34-cephctl.maas | CHANGED | rc=0 >> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3bfd5df71ef3 ceph/daemon:latest-octopus "ceph --cluster ceph…" 42 minutes ago Up 42 minutes zealous_beaver

How to reproduce it (minimal and precise): Latest ceph-ansible 5.0 stable (30/10/2020) ansible-playbook -v -i /opt/ceph-ansible/inventory -e 'ansible_python_interpreter=/usr/bin/python3' /opt/ceph-ansible/site-container.yml

Environment:

dsavineau commented 3 years ago

Are you able to see what docker command is running on your first monitor ?

You should have a docker run command exectuting ceph dashboard xxx.

Otherwise could try to run this command manually ?

$ docker run --rm --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=ceph docker.io/ceph/daemon:latest-octopus --cluster ceph dashboard ac-user-show admin

If the user doesn't exist then run

$ docker run --rm --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=ceph docker.io/ceph/daemon:latest-octopus --cluster ceph dashboard ac-user-create admin p@ssw0rd

I'm just a little bit concern by the password you're using which doesn't match the dashboard password policy [1]

[1] https://docs.ceph.com/en/latest/mgr/dashboard/#password-policy

jeevadotnet commented 3 years ago

Hi @dsavineau, i will get back to you running those commands, however just want to stipulate a few things i've done in the meantime before I read your post.

2020-10-30 18:46:30,253 p=ilifu-adm u=30564 | TASK [ceph-dashboard : create dashboard admin user] ****************************************************************************************************** 2020-10-30 18:46:30,254 p=ilifu-adm u=30564 | Friday 30 October 2020 18:46:30 +0200 (0:00:02.004) 1:28:14.213 ******** 2020-10-30 18:46:30,398 p=ilifu-adm u=5300 | Using module file /opt/ceph-ansible/library/ceph_dashboard_user.py 2020-10-30 18:46:30,398 p=ilifu-adm u=5300 | Pipelining is enabled. 2020-10-30 18:46:30,398 p=ilifu-adm u=5300 | <A-08-34-cephctl.maas> ESTABLISH SSH CONNECTION FOR USER: ubuntu 2020-10-30 18:46:30,399 p=ilifu-adm u=5300 | <A-08-34-cephctl.maas> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=60 -o ControlPath=/home/ilifu-adm/.ansible/cp/%h-%r-%p A-08-34-cephctl.maas '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-imliegqdimnxjgrnlwbdbalrhkqyvqwg ; CEPH_CONTAINER_IMAGE=docker.io/ceph/daemon:latest-octopus CEPH_CONTAINER_BINARY=docker /usr/bin/python3'"'"'"'"'"'"'"'"' && sleep 0'"'"'' 2020-10-30 18:46:30,429 p=ilifu-adm u=5300 | Escalation succeeded

docker inspect

            "Cmd": [
                "--cluster",
                "ceph",
                "dashboard",
                "ac-user-show",
                "admin",
                "--format=json"

verbose

Submitting command:  {'prefix': 'dashboard ac-user-show', 'username': 'admin', 'target': ('mon-mgr', ''), 'format': 'json'}
submit ['{"prefix": "dashboard ac-user-show", "username": "admin", "target": ["mon-mgr", ""], "format": "json"}'] to mon-mgr
unifiedcommsguy commented 3 years ago

I can also replicate this issue.

        "Args": [
            "--cluster",
            "ceph",
            "dashboard",
            "ac-user-show",
            "admin",
            "--format=json"
        ],

If you kill this container, another spawns with following args:

        "Args": [
            "--cluster",
            "ceph",
            "dashboard",
            "ac-user-create",
            "admin",
            "<complex password here>"
        ],

And further killing this container spawns a third with:

        "Args": [
            "--cluster",
            "ceph",
            "dashboard",
            "ac-user-set-roles",
            "admin",
            "administrator"
        ],

If this third container is killed the playbook fails as follows:

fatal: [hostname -> None]: FAILED! => changed=true
  cmd:
  - docker
  - run
  - --rm
  - --net=host
  - -v
  - /etc/ceph:/etc/ceph:z
  - -v
  - /var/lib/ceph/:/var/lib/ceph/:z
  - -v
  - /var/log/ceph/:/var/log/ceph/:z
  - --entrypoint=ceph
  - docker.io/ceph/daemon:latest-octopus
  - --cluster
  - ceph
  - dashboard
  - ac-user-set-roles
  - admin
  - administrator
  delta: '0:06:03.905827'
  end: '2020-11-02 11:49:44.777892'
  invocation:
    module_args:
      cluster: ceph
      name: admin
      password: VALUE_SPECIFIED_IN_NO_LOG_PARAMETER
      roles:
      - administrator
      state: present
  rc: 137
  start: '2020-11-02 11:43:40.872065'
  stderr: ''
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
jeevadotnet commented 3 years ago

$ sudo docker run --rm --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=ceph docker.io/ceph/daemon:latest-octopus --cluster ceph dashboard ac-user-show admin

[
    {
        "Id": "f8f38d6f42a885ea271effddcb52a7885c79e07457912586c0494e24ea4cc0ac",
        "Created": "2020-11-02T06:54:26.927094685Z",
        "Path": "ceph",
        "Args": [
            "--cluster",
            "ceph",
            "dashboard",
            "ac-user-show",
            "admin"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 38975,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2020-11-02T06:54:27.077838675Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:570d50684a596c75fb17e203aef6d7e7a96a6a28fac3cca644c15e86d0e57789",
        "ResolvConfPath": "/var/lib/docker/containers/f8f38d6f42a885ea271effddcb52a7885c79e07457912586c0494e24ea4cc0ac/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/f8f38d6f42a885ea271effddcb52a7885c79e07457912586c0494e24ea4cc0ac/hostname",
        "HostsPath": "/var/lib/docker/containers/f8f38d6f42a885ea271effddcb52a7885c79e07457912586c0494e24ea4cc0ac/hosts",
        "LogPath": "/var/lib/docker/containers/f8f38d6f42a885ea271effddcb52a7885c79e07457912586c0494e24ea4cc0ac/f8f38d6f42a885ea271effddcb52a7885c79e07457912586c0494e24ea4cc0ac-json.log",
        "Name": "/crazy_cori",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/etc/ceph:/etc/ceph:z",
                "/var/lib/ceph/:/var/lib/ceph/:z",
                "/var/log/ceph/:/var/log/ceph/:z"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "host",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": true,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Capabilities": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/cc9685b722012b24c907e23a1d625d2a28709ee8619fb388bac15d7be4d0e199-init/diff:/var/lib/docker/overlay2/cb9e8974fddfb18f7d65310380a2ec4b405b491d3411befcd8372fe47cdf2723/diff:/var/lib/docker/overlay2/7e73f83f69b92856e522d97f50827ee38c29987891d6ef1e5a6e4da53b326bf0/diff:/var/lib/docker/overlay2/9db2586bad69c5c5a21877dcc5dd39526c31d0dd1887ba4334968e00b6a8f639/diff:/var/lib/docker/overlay2/8da8df6ac277f7a59f0cb5bb51cc0ef6ec66f4560de3e023fc4143d9a11717f4/diff:/var/lib/docker/overlay2/625f9863807829a3c78a4a7e1ec4ff4e140b7d623eba8ec10ccba28aab6cc63f/diff:/var/lib/docker/overlay2/5b1bacc3c043ec4828f079ebe4cb9544c378da4418d253d83422f40b51d8bbc7/diff:/var/lib/docker/overlay2/fec53cb51fdf3a86f25007676eb6a080684345ebdb73d7f203d174e5816af18d/diff:/var/lib/docker/overlay2/d1c4daf7a7e608e29bbb40db9f6341ed0157a200cada98af8b72b182ad7cfe3b/diff:/var/lib/docker/overlay2/cf5c8289199b47d98483ffaa7a0e2ac7a1c44f3aa7e008b2ba049db734223ad8/diff:/var/lib/docker/overlay2/6509740887af9ca7245a4627f15699b940a86635a13b274bfeb52b2cef9a4208/diff",
                "MergedDir": "/var/lib/docker/overlay2/cc9685b722012b24c907e23a1d625d2a28709ee8619fb388bac15d7be4d0e199/merged",
                "UpperDir": "/var/lib/docker/overlay2/cc9685b722012b24c907e23a1d625d2a28709ee8619fb388bac15d7be4d0e199/diff",
                "WorkDir": "/var/lib/docker/overlay2/cc9685b722012b24c907e23a1d625d2a28709ee8619fb388bac15d7be4d0e199/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/log/ceph",
                "Destination": "/var/log/ceph",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/etc/ceph",
                "Destination": "/etc/ceph",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/ceph",
                "Destination": "/var/lib/ceph",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "A-08-34-cephctl",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "CEPH_VERSION=octopus",
                "CEPH_POINT_RELEASE=",
                "CEPH_DEVEL=false",
                "CEPH_REF=octopus",
                "OSD_FLAVOR=default"
            ],
            "Cmd": [
                "--cluster",
                "ceph",
                "dashboard",
                "ac-user-show",
                "admin"
            ],
            "Image": "docker.io/ceph/daemon:latest-octopus",
            "Volumes": null,
            "WorkingDir": "/",
            "Entrypoint": [
                "ceph"
            ],
            "OnBuild": null,
            "Labels": {
                "CEPH_POINT_RELEASE": "",
                "GIT_BRANCH": "HEAD",
                "GIT_CLEAN": "False",
                "GIT_COMMIT": "aa644a5fd03aecb9eda8b95002d25ded9c57b25e",
                "GIT_REPO": "https://github.com/ceph/ceph-container.git",
                "RELEASE": "master-aa644a5",
                "ceph": "True",
                "maintainer": "Dimitri Savineau <dsavinea@redhat.com>",
                "org.label-schema.build-date": "20200809",
                "org.label-schema.license": "GPLv2",
                "org.label-schema.name": "CentOS Base Image",
                "org.label-schema.schema-version": "1.0",
                "org.label-schema.vendor": "CentOS"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "c0adef0a440fdb656b3ec5f766d94eff2f7138f7f1c743352716808a0c146fcc",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/default",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "host": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "6b329fe3c32f140a30de2587ccf53bb73181171c72ce6d8625f186ebc05f284a",
                    "EndpointID": "95e63fac233a3027f8625b0b9792be9e1d63e1cd4c05150dbce0ce2f2daf985b",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null
                }
            }
        }
    }
]

sudo docker run --rm --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=ceph docker.io/ceph/daemon:latest-octopus --cluster ceph dashboard ac-user-create admin p@ssw0rd

docker inspect

[
    {
        "Id": "f33d8db9bfeb04d8c5db50d5db03122e9741483f534945e6a95c4022a0d9d496",
        "Created": "2020-11-02T06:50:26.410518751Z",
        "Path": "ceph",
        "Args": [
            "--cluster",
            "ceph",
            "dashboard",
            "ac-user-create",
            "admin",
            "p@ssw0rd"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 38774,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2020-11-02T06:50:26.55076458Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:570d50684a596c75fb17e203aef6d7e7a96a6a28fac3cca644c15e86d0e57789",
        "ResolvConfPath": "/var/lib/docker/containers/f33d8db9bfeb04d8c5db50d5db03122e9741483f534945e6a95c4022a0d9d496/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/f33d8db9bfeb04d8c5db50d5db03122e9741483f534945e6a95c4022a0d9d496/hostname",
        "HostsPath": "/var/lib/docker/containers/f33d8db9bfeb04d8c5db50d5db03122e9741483f534945e6a95c4022a0d9d496/hosts",
        "LogPath": "/var/lib/docker/containers/f33d8db9bfeb04d8c5db50d5db03122e9741483f534945e6a95c4022a0d9d496/f33d8db9bfeb04d8c5db50d5db03122e9741483f534945e6a95c4022a0d9d496-json.log",
        "Name": "/unruffled_jepsen",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/etc/ceph:/etc/ceph:z",
                "/var/lib/ceph/:/var/lib/ceph/:z",
                "/var/log/ceph/:/var/log/ceph/:z"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "host",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": true,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Capabilities": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/6b1e2448c440e0ce2c910e76097eca65c4b5262c3eed3568456f4ffbbf012fe3-init/diff:/var/lib/docker/overlay2/cb9e8974fddfb18f7d65310380a2ec4b405b491d3411befcd8372fe47cdf2723/diff:/var/lib/docker/overlay2/7e73f83f69b92856e522d97f50827ee38c29987891d6ef1e5a6e4da53b326bf0/diff:/var/lib/docker/overlay2/9db2586bad69c5c5a21877dcc5dd39526c31d0dd1887ba4334968e00b6a8f639/diff:/var/lib/docker/overlay2/8da8df6ac277f7a59f0cb5bb51cc0ef6ec66f4560de3e023fc4143d9a11717f4/diff:/var/lib/docker/overlay2/625f9863807829a3c78a4a7e1ec4ff4e140b7d623eba8ec10ccba28aab6cc63f/diff:/var/lib/docker/overlay2/5b1bacc3c043ec4828f079ebe4cb9544c378da4418d253d83422f40b51d8bbc7/diff:/var/lib/docker/overlay2/fec53cb51fdf3a86f25007676eb6a080684345ebdb73d7f203d174e5816af18d/diff:/var/lib/docker/overlay2/d1c4daf7a7e608e29bbb40db9f6341ed0157a200cada98af8b72b182ad7cfe3b/diff:/var/lib/docker/overlay2/cf5c8289199b47d98483ffaa7a0e2ac7a1c44f3aa7e008b2ba049db734223ad8/diff:/var/lib/docker/overlay2/6509740887af9ca7245a4627f15699b940a86635a13b274bfeb52b2cef9a4208/diff",
                "MergedDir": "/var/lib/docker/overlay2/6b1e2448c440e0ce2c910e76097eca65c4b5262c3eed3568456f4ffbbf012fe3/merged",
                "UpperDir": "/var/lib/docker/overlay2/6b1e2448c440e0ce2c910e76097eca65c4b5262c3eed3568456f4ffbbf012fe3/diff",
                "WorkDir": "/var/lib/docker/overlay2/6b1e2448c440e0ce2c910e76097eca65c4b5262c3eed3568456f4ffbbf012fe3/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/etc/ceph",
                "Destination": "/etc/ceph",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/ceph",
                "Destination": "/var/lib/ceph",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/log/ceph",
                "Destination": "/var/log/ceph",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "A-08-34-cephctl",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "CEPH_VERSION=octopus",
                "CEPH_POINT_RELEASE=",
                "CEPH_DEVEL=false",
                "CEPH_REF=octopus",
                "OSD_FLAVOR=default"
            ],
            "Cmd": [
                "--cluster",
                "ceph",
                "dashboard",
                "ac-user-create",
                "admin",
                "p@ssw0rd"
            ],
            "Image": "docker.io/ceph/daemon:latest-octopus",
            "Volumes": null,
            "WorkingDir": "/",
            "Entrypoint": [
                "ceph"
            ],
            "OnBuild": null,
            "Labels": {
                "CEPH_POINT_RELEASE": "",
                "GIT_BRANCH": "HEAD",
                "GIT_CLEAN": "False",
                "GIT_COMMIT": "aa644a5fd03aecb9eda8b95002d25ded9c57b25e",
                "GIT_REPO": "https://github.com/ceph/ceph-container.git",
                "RELEASE": "master-aa644a5",
                "ceph": "True",
                "maintainer": "Dimitri Savineau <dsavinea@redhat.com>",
                "org.label-schema.build-date": "20200809",
                "org.label-schema.license": "GPLv2",
                "org.label-schema.name": "CentOS Base Image",
                "org.label-schema.schema-version": "1.0",
                "org.label-schema.vendor": "CentOS"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "3dd2538c1499271dd13ade46983eefc817f85f1af028ed3c1a500a70959b581d",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/default",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "host": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "6b329fe3c32f140a30de2587ccf53bb73181171c72ce6d8625f186ebc05f284a",
                    "EndpointID": "3bdbb8a96e3546d9f0560a3d5f4ae6d0d473fb45ab4390702d4bdc67bf5853b2",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null
                }
            }
        }
    }
]
dsavineau commented 3 years ago

I deploy these same variables (just monitor, cluster & public network changes)

@jeevadotnet Could you tell me the network changes between those two deployments ?

jeevadotnet commented 3 years ago

Noticible difference:

My Testbed Ceph-ansible Git Log

git log --pretty=oneline HEAD~5..HEAD
0c66f909684ea91ccfda35668471c6c6c8c5d88f (HEAD -> stable-5.0, origin/stable-5.0) ceph-osd: start osd after systemd overrides
3f610811febafca47e82b8979a375b2e74954b51 ceph-osd: don't start the OSD services twice
d258bf4d2de2fd38444a31b76111f13c325bba06 handler: refact check_socket_non_container
c733af9d43e572c97f301e7f704676f2ea5660dc Fix Ansible check mode for site.yml.sample playbook
3fa84cf44a0cce38c1a5ff1dfe422df63e66db43 tests: change cephfs pool size

My New Cluster's with issues, Git Log

git log --pretty=oneline HEAD~5..HEAD
b5985d2e8308f7ad885166194e8ca7810375f82c (HEAD -> stable-5.0, origin/stable-5.0) common: drop `fetch_directory` feature
76be9a42925e7e4abf3f44d550faabb3652a33a4 ceph-config: ceph.conf rendering refactor
3eed44907b0befc3962432f14c048fdeeaf69adb iscsi: fix ownership on iscsi-gateway.cfg
a6dac8c93d35a564a6710365112c4079a2efe1c8 crash: refact caps definition
e6b3186420d575c2cb72f84d2c64fbb91cbd2869 ceph-volume: refresh lvm metadata cache

Group_vars: https://pastebin.com/7HG8267p

group_vars/all.yml:---
group_vars/all.yml:dummy:
group_vars/all.yml:ceph_release_num:
group_vars/all.yml:  dumpling: 0.67
group_vars/all.yml:  emperor: 0.72
group_vars/all.yml:  firefly: 0.80
group_vars/all.yml:  giant: 0.87
group_vars/all.yml:  hammer: 0.94
group_vars/all.yml:  infernalis: 9
group_vars/all.yml:  jewel: 10
group_vars/all.yml:  kraken: 11
group_vars/all.yml:  luminous: 12
group_vars/all.yml:  mimic: 13
group_vars/all.yml:  nautilus: 14
group_vars/all.yml:  octopus: 15
group_vars/all.yml:  pacific: 16
group_vars/all.yml:  dev: 99
group_vars/all.yml:cluster: ceph
group_vars/all.yml:mon_group_name: mons
group_vars/all.yml:osd_group_name: osds
group_vars/all.yml:rgw_group_name: rgws
group_vars/all.yml:mds_group_name: mdss
group_vars/all.yml:nfs_group_name: nfss
group_vars/all.yml:rbdmirror_group_name: rbdmirrors
group_vars/all.yml:client_group_name: clients
group_vars/all.yml:iscsi_gw_group_name: iscsigws
group_vars/all.yml:mgr_group_name: mgrs
group_vars/all.yml:rgwloadbalancer_group_name: rgwloadbalancers
group_vars/all.yml:grafana_server_group_name: grafana-server
group_vars/all.yml:ceph_conf_local: false
group_vars/all.yml:ntp_service_enabled: true
group_vars/all.yml:ntp_daemon_type: chronyd
group_vars/all.yml:upgrade_ceph_packages: True
group_vars/all.yml:ceph_origin: 'repository'
group_vars/all.yml:ceph_repository: 'community'
group_vars/all.yml:ceph_mirror: http://download.ceph.com
group_vars/all.yml:ceph_stable_key: https://download.ceph.com/keys/release.asc
group_vars/all.yml:ceph_stable_release: octopus
group_vars/all.yml:ceph_stable_repo: "{{ ceph_mirror }}/debian-{{ ceph_stable_release }}"
group_vars/all.yml:fsid: "3f5452e1-ff5d-4a41-9e8d-5a3467036771"
group_vars/all.yml:generate_fsid: false
group_vars/all.yml:ceph_conf_key_directory: /etc/ceph
group_vars/all.yml:ceph_uid: "{{ '64045' if not containerized_deployment | bool and ansible_os_family == 'Debian' else '167' }}"
group_vars/all.yml:ceph_keyring_permissions: '0600'
group_vars/all.yml:cephx: true
group_vars/all.yml:monitor_interface: 'enp132s0f0'
group_vars/all.yml:ip_version: ipv4
group_vars/all.yml:cephfs: cephfs # name of the ceph filesystem
group_vars/all.yml:cephfs_data_pool:
group_vars/all.yml:  name: "{{ cephfs_data if cephfs_data is defined else 'cephfs_data' }}"
group_vars/all.yml:cephfs_metadata_pool:
group_vars/all.yml:  name: "{{ cephfs_metadata if cephfs_metadata is defined else 'cephfs_metadata' }}"
group_vars/all.yml:cephfs_pools:
group_vars/all.yml:  - "{{ cephfs_data_pool }}"
group_vars/all.yml:  - "{{ cephfs_metadata_pool }}"
group_vars/all.yml:is_hci: false
group_vars/all.yml:hci_safety_factor: 0.2
group_vars/all.yml:non_hci_safety_factor: 0.7
group_vars/all.yml:osd_memory_target: 4294967296
group_vars/all.yml:public_network: '10.102.17.0/24'
group_vars/all.yml:cluster_network: '10.102.17.0/24'
group_vars/all.yml:osd_objectstore: bluestore
group_vars/all.yml:ceph_docker_image: "ceph/daemon"
group_vars/all.yml:ceph_docker_image_tag: latest-octopus
group_vars/all.yml:ceph_docker_registry: docker.io
group_vars/all.yml:ceph_docker_registry_auth: false
group_vars/all.yml:containerized_deployment: true
group_vars/all.yml:rolling_update: false
group_vars/all.yml:openstack_config: true
group_vars/all.yml:openstack_glance_pool:
group_vars/all.yml:  name: "images"
group_vars/all.yml:  application: "rbd"
group_vars/all.yml:openstack_cinder_pool:
group_vars/all.yml:  name: "volumes"
group_vars/all.yml:  application: "rbd"
group_vars/all.yml:openstack_nova_pool:
group_vars/all.yml:  name: "vms"
group_vars/all.yml:  application: "rbd"
group_vars/all.yml:openstack_cinder_backup_pool:
group_vars/all.yml:  name: "backups"
group_vars/all.yml:  application: "rbd"
group_vars/all.yml:openstack_cephfs_data_pool:
group_vars/all.yml:  name: "manila_data"
group_vars/all.yml:  application: "cephfs"
group_vars/all.yml:openstack_cephfs_metadata_pool:
group_vars/all.yml:  name: "manila_metadata"
group_vars/all.yml:  application: "cephfs"
group_vars/all.yml:openstack_pools:
group_vars/all.yml:  - "{{ openstack_glance_pool }}"
group_vars/all.yml:  - "{{ openstack_cinder_pool }}"
group_vars/all.yml:  - "{{ openstack_nova_pool }}"
group_vars/all.yml:  - "{{ openstack_cinder_backup_pool }}"
group_vars/all.yml:  - "{{ openstack_cephfs_data_pool }}"
group_vars/all.yml:  - "{{ openstack_cephfs_metadata_pool }}"
group_vars/all.yml:openstack_keys:
group_vars/all.yml:  - { name: client.glance, caps: { mon: "profile rbd", osd: "profile rbd pool={{ openstack_cinder_pool.name }}, profile rbd pool={{ openstack_glance_pool.name }}"}, mode: "0600" }
group_vars/all.yml:  - { name: client.cinder, caps: { mon: "profile rbd", osd: "profile rbd pool={{ openstack_cinder_pool.name }}, profile rbd pool={{ openstack_nova_pool.name }}, profile rbd pool={{ openstack_glance_pool.name }}"}, mode: "0600" }
group_vars/all.yml:  - { name: client.cinder-backup, caps: { mon: "profile rbd", osd: "profile rbd pool={{ openstack_cinder_backup_pool.name }}"}, mode: "0600" }
group_vars/all.yml:  - { name: client.openstack, caps: { mon: "profile rbd", osd: "profile rbd pool={{ openstack_glance_pool.name }}, profile rbd pool={{ openstack_nova_pool.name }}, profile rbd pool={{ openstack_cinder_pool.name }}, profile rbd pool={{ openstack_cinder_backup_pool.name }}"}, mode: "0600" }
group_vars/all.yml:dashboard_enabled: True
group_vars/all.yml:dashboard_protocol: http
group_vars/all.yml:dashboard_port: 8443
group_vars/all.yml:dashboard_admin_user: admin
group_vars/all.yml:dashboard_admin_user_ro: false
group_vars/all.yml:dashboard_admin_password: p@ssw0rd
group_vars/all.yml:dashboard_crt: ''
group_vars/all.yml:dashboard_key: ''
group_vars/all.yml:dashboard_tls_external: false
group_vars/all.yml:dashboard_grafana_api_no_ssl_verify: False
group_vars/all.yml:dashboard_rgw_api_user_id: ceph-dashboard
group_vars/all.yml:dashboard_rgw_api_admin_resource: ''
group_vars/all.yml:dashboard_rgw_api_no_ssl_verify: False
group_vars/all.yml:dashboard_frontend_vip: ''
group_vars/all.yml:node_exporter_container_image: "docker.io/prom/node-exporter:v0.17.0"
group_vars/all.yml:node_exporter_port: 9100
group_vars/all.yml:grafana_admin_user: admin
group_vars/all.yml:grafana_admin_password: admin
group_vars/all.yml:grafana_crt: ''
group_vars/all.yml:grafana_key: ''
group_vars/all.yml:grafana_server_fqdn: ''
group_vars/all.yml:grafana_container_image: "docker.io/grafana/grafana:5.4.3"
group_vars/all.yml:grafana_container_cpu_period: 100000
group_vars/all.yml:grafana_container_cpu_cores: 2
group_vars/all.yml:grafana_container_memory: 4
group_vars/all.yml:grafana_uid: 472
group_vars/all.yml:grafana_datasource: Dashboard
group_vars/all.yml:grafana_dashboards_path: "/etc/grafana/dashboards/ceph-dashboard"
group_vars/all.yml:grafana_dashboard_version: master
group_vars/all.yml:grafana_dashboard_files:
group_vars/all.yml:  - ceph-cluster.json
group_vars/all.yml:  - cephfs-overview.json
group_vars/all.yml:  - host-details.json
group_vars/all.yml:  - hosts-overview.json
group_vars/all.yml:  - osd-device-details.json
group_vars/all.yml:  - osds-overview.json
group_vars/all.yml:  - pool-detail.json
group_vars/all.yml:  - pool-overview.json
group_vars/all.yml:  - radosgw-detail.json
group_vars/all.yml:  - radosgw-overview.json
group_vars/all.yml:  - rbd-overview.json
group_vars/all.yml:grafana_plugins:
group_vars/all.yml:  - vonage-status-panel
group_vars/all.yml:  - grafana-piechart-panel
group_vars/all.yml:grafana_allow_embedding: True
group_vars/all.yml:grafana_port: 3000
group_vars/all.yml:prometheus_container_image: "docker.io/prom/prometheus:v2.7.2"
group_vars/all.yml:prometheus_container_cpu_period: 100000
group_vars/all.yml:prometheus_container_cpu_cores: 2
group_vars/all.yml:prometheus_container_memory: 4
group_vars/all.yml:prometheus_data_dir: /var/lib/prometheus
group_vars/all.yml:prometheus_conf_dir: /etc/prometheus
group_vars/all.yml:prometheus_user_id: '65534'  # This is the UID used by the prom/prometheus container image
group_vars/all.yml:prometheus_port: 9092
group_vars/all.yml:alertmanager_container_image: "docker.io/prom/alertmanager:v0.16.2"
group_vars/all.yml:alertmanager_container_cpu_period: 100000
group_vars/all.yml:alertmanager_container_cpu_cores: 2
group_vars/all.yml:alertmanager_container_memory: 4
group_vars/all.yml:alertmanager_data_dir: /var/lib/alertmanager
group_vars/all.yml:alertmanager_conf_dir: /etc/alertmanager
group_vars/all.yml:alertmanager_port: 9093
group_vars/all.yml:alertmanager_cluster_port: 9094
group_vars/all.yml:gateway_iqn: ""
group_vars/all.yml:gateway_ip_list: 0.0.0.0
group_vars/all.yml:rbd_devices: {}
group_vars/all.yml:client_connections: {}
group_vars/all.yml:container_exec_cmd:
group_vars/all.yml:docker: false
group_vars/all.yml:ceph_volume_debug: "{{ enable_ceph_volume_debug | ternary(1, 0)  }}"
group_vars/mdss.yml:---
group_vars/mdss.yml:dummy:
group_vars/mgrs.yml:---
group_vars/mgrs.yml:dummy:
group_vars/mgrs.yml:ceph_mgr_modules: [status, dashboard]
group_vars/mons.yml:---
group_vars/mons.yml:dummy:
group_vars/osds.yml:---
group_vars/osds.yml:dummy:
group_vars/osds.yml:osd_auto_discovery: false

Testbed Inventory: https://pastebin.com/k7ipi4xX

[all:vars]
ansible_ssh_user=ubuntu

[nodepool01]
A-08-02-storage.maas
A-08-08-storage.maas
A-09-02-storage.maas

[mons:children]
nodepool01

[osds]
A-08-02-storage.maas osd_objectstore=bluestore devices="[ '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl', '/dev/sdm', '/dev/sdn', '/dev/sdo', '/dev/sdp', '/dev/sdq', '/dev/sdr', '/dev/sds', '/dev/sdt', '/dev/sdu', '/dev/sdv', '/dev/sdw', '/dev/sdx', '/dev/sdy', '/dev/sdz', '/dev/sdaa', '/dev/sdab', '/dev/sdac', '/dev/sdad', '/dev/sdae', '/dev/sdaf', '/dev/sdag', '/dev/sdah', '/dev/sdai', '/dev/sdaj', '/dev/sdak', '/dev/sdal', '/dev/sdam', '/dev/sdan' ]" dedicated_devices="[ '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb' ]"
A-08-08-storage.maas osd_objectstore=bluestore devices="[ '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl', '/dev/sdm', '/dev/sdn', '/dev/sdo', '/dev/sdp', '/dev/sdq', '/dev/sdr', '/dev/sds', '/dev/sdt', '/dev/sdu', '/dev/sdv', '/dev/sdw', '/dev/sdx', '/dev/sdy', '/dev/sdz', '/dev/sdaa', '/dev/sdab', '/dev/sdac', '/dev/sdad', '/dev/sdae', '/dev/sdaf', '/dev/sdag', '/dev/sdah', '/dev/sdai', '/dev/sdaj', '/dev/sdak', '/dev/sdal', '/dev/sdam', '/dev/sdan' ]" dedicated_devices="[ '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb' ]"
A-09-02-storage.maas osd_objectstore=bluestore devices="[ '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl', '/dev/sdm', '/dev/sdn', '/dev/sdo', '/dev/sdp', '/dev/sdq', '/dev/sdr', '/dev/sds', '/dev/sdt', '/dev/sdu', '/dev/sdv', '/dev/sdw', '/dev/sdx', '/dev/sdy', '/dev/sdz', '/dev/sdaa', '/dev/sdab', '/dev/sdac', '/dev/sdad', '/dev/sdae', '/dev/sdaf', '/dev/sdag', '/dev/sdah', '/dev/sdai', '/dev/sdaj', '/dev/sdak', '/dev/sdal', '/dev/sdam', '/dev/sdan' ]" dedicated_devices="[ '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb', '/dev/sdb' ]"

[mdss:children]
nodepool01

[mgrs:children]
nodepool01

[grafana-server]
A-09-02-storage.maas

[monitoring]
A-09-02-storage.maas

Testbed Ceph Status

[root@A-09-02-storage /]# ceph status
  cluster:
    id:     3f5452e1-ff5d-4a41-9e8d-5a3467036771
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum A-09-02-storage,A-08-08-storage,A-08-02-storage (age 4d)
    mgr: A-08-02-storage(active, since 4d), standbys: A-09-02-storage, A-08-08-storage
    mds: cephfs:1 {0=A-08-02-storage=up:active} 2 up:standby
    osd: 114 osds: 114 up (since 4d), 114 in (since 4d)

  task status:
    scrub status:
        mds.A-08-02-storage: idle

  data:
    pools:   9 pools, 257 pgs
    objects: 139 objects, 3.3 KiB
    usage:   11 TiB used, 415 TiB / 426 TiB avail
    pgs:     257 active+clean

Testbed Node A-09-02 set as grafana/monitoring server sudo docker ps

ubuntu@A-09-02-storage:~$ sudo docker ps
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS               NAMES
430b882cef13        prom/prometheus:v2.7.2       "/bin/prometheus --c…"   4 days ago          Up 4 days                               prometheus
f6e3c5ca04f0        prom/alertmanager:v0.16.2    "/bin/alertmanager -…"   4 days ago          Up 4 days                               alertmanager
4b182129cbfb        grafana/grafana:5.4.3        "/run.sh"                4 days ago          Up 4 days                               grafana-server
2ee17e8b629c        prom/node-exporter:v0.17.0   "/bin/node_exporter …"   4 days ago          Up 4 days                               node-exporter
47977e504084        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-mds-A-09-02-storage
7c9b4b6e6900        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-32
bb70d5fb3673        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-89
e69352f3c3a4        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-35
06a1a37f03f8        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-51
89e508e84b45        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-57
2e212d08f1e4        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-54
4f1d1c52926c        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-17
4a0304878071        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-14
fd7c76f09b9f        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-38
15761d119706        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-11
05405c05e197        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-104
ee2b9b895f3a        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-107
4109b41d4a0e        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-95
12eb3474ca7d        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-101
14fdbba6597f        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-72
4ed8ab0a238b        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-74
7d4edda469b6        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-98
7d54c80eb3aa        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-77
9c58089494e6        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-8
167c2bb4820c        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-5
7d1bbe14df87        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-2
b19248669bec        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-0
60d27d8ac359        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-41
cd1efdbc504c        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-29
5a83b36e75e9        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-44
c5bdc89c4bcd        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-47
ba8d6a916a38        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-20
9fe239917415        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-92
5b5eaeb4a1b3        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-26
209c2a4e1c1d        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-24
2a42918ff9ca        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-86
3a5d614f816b        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-80
2ec363cfa360        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-69
6804479a418c        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-83
16f943a9357f        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-66
fd9aab8eea44        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-110
4e8e075e2b7a        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-63
40182a8410df        ceph/daemon:latest-octopus   "/opt/ceph-container…"   4 days ago          Up 4 days                               ceph-osd-60
190dc94da7c8        ceph/daemon:latest-octopus   "/opt/ceph-container…"   5 days ago          Up 5 days                               ceph-mgr-A-09-02-storage
719642c8bc49        ceph/daemon:latest-octopus   "/opt/ceph-container…"   5 days ago          Up 5 days                               ceph-mon-A-09-02-storage
s2n-Gribbly commented 3 years ago

I am seeing the same issue here, but only if I try to deploy with a crush map.

If I have the following in my inventory for ceph-ansible, it will get stuck on the setting password task:

[osds]
node01 osd_crush_location="{ 'root': 'dev-root', 'rack': 'rack-1', 'chassis': 'chassis-1', 'host': 'node01'}"
node02 osd_crush_location="{ 'root': 'dev-root', 'rack': 'rack-1', 'chassis': 'chassis-1', 'host': 'node02'}"
node03 osd_crush_location="{ 'root': 'dev-root', 'rack': 'rack-1', 'chassis': 'chassis-2', 'host': 'node03'}"
node04 osd_crush_location="{ 'root': 'dev-root', 'rack': 'rack-1', 'chassis': 'chassis-2', 'host': 'node04'}"
node05 osd_crush_location="{ 'root': 'dev-root', 'rack': 'rack-2', 'chassis': 'chassis-3', 'host': 'node05'}"
node06 osd_crush_location="{ 'root': 'dev-root', 'rack': 'rack-2', 'chassis': 'chassis-3', 'host': 'node06'}"
node07 osd_crush_location="{ 'root': 'dev-root', 'rack': 'rack-2', 'chassis': 'chassis-4', 'host': 'node07'}"
node08 osd_crush_location="{ 'root': 'dev-root', 'rack': 'rack-2', 'chassis': 'chassis-4', 'host': 'node08'}"

If I remove the osd_crush_location from the inventory, it deploys fine.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

kgfathur commented 2 years ago

Hi Everyone,

I am facing exactly the same issue with ceph cluster. The ceph daemon (MON, MGR) is colocated with the controller, because the ceph is deployed by OpenStack (Tripleo). However, the issue is exactly same. I am not sure that osd_crush_location is tthe real root cause, and removing them can solved the issue. because I already do that. but still fall into same issue.

In my case, I notice that my cluster have some healthy warning. and some pgs not in active+clean state. I also notice that my rgw services not running (I also enable the ceph-rgw services).

[root@osp-ctrl01 /]# ceph status
  cluster:
    id:     8a141276-7314-44bc-824f-***************
    health: HEALTH_WARN
            Reduced data availability: 601 pgs inactive
            Degraded data redundancy: 676 pgs undersized
            mons are allowing insecure global_id reclaim

  services:
    mon: 3 daemons, quorum osp-ctrl01,osp-ctrl02,osp-ctrl03 (age 24h)
    mgr: osp-ctrl01(active, since 28m), standbys: osp-ctrl02, osp-ctrl03
    mds: cephfs:1 {0=osp-ctrl01=up:active} 2 up:standby
    osd: 84 osds: 84 up (since 3d), 84 in (since 3d); 231 remapped pgs

  data:
    pools:   13 pools, 832 pgs
    objects: 22 objects, 3.6 KiB
    usage:   28 TiB used, 541 TiB / 569 TiB avail
    pgs:     72.236% pgs not active
             44/66 objects misplaced (66.667%)
             601 undersized+peered
             128 active+clean+remapped
             75  active+undersized+remapped
             28  active+clean

[root@osp-ctrl01 /]#

It because I set the crush rule's failure-domain for the replication rule is set to rack.

[root@osp-ctrl01 /]# ceph osd crush rule dump
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 1,
        "rule_name": "SSD",
        "ruleset": 1,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -51,
                "item_name": "SSD_ROOT~ssd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "rack"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 2,
        "rule_name": "HDD",
        "ruleset": 2,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -48,
                "item_name": "HDD_ROOT~hdd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "rack"
            },
            {
                "op": "emit"
            }
        ]
    }
]

[root@osp-ctrl01 /]#

However, I just only deploy some node in the same rack and not yet completely deploys all available host.

[root@osp-ctrl01 /]# ceph osd tree
ID  CLASS WEIGHT    TYPE NAME                               STATUS REWEIGHT PRI-AFF
-47       174.21890 root HDD_ROOT
-28       174.21890     rack HDD_RACK2
 -4        58.07297         host osp-hdd01
  0   hdd   5.80730             osd.0                           up  1.00000 1.00000
  6   hdd   5.80730             osd.6                           up  1.00000 1.00000
  9   hdd   5.80730             osd.9                           up  1.00000 1.00000
 14   hdd   5.80730             osd.14                          up  1.00000 1.00000
 19   hdd   5.80730             osd.19                          up  1.00000 1.00000
 22   hdd   5.80730             osd.22                          up  1.00000 1.00000
 25   hdd   5.80730             osd.25                          up  1.00000 1.00000
 28   hdd   5.80730             osd.28                          up  1.00000 1.00000
 34   hdd   5.80730             osd.34                          up  1.00000 1.00000
 37   hdd   5.80730             osd.37                          up  1.00000 1.00000
-10        58.07297         host osp-hdd02
  1   hdd   5.80730             osd.1                           up  1.00000 1.00000
  7   hdd   5.80730             osd.7                           up  1.00000 1.00000
 11   hdd   5.80730             osd.11                          up  1.00000 1.00000
 12   hdd   5.80730             osd.12                          up  1.00000 1.00000
 20   hdd   5.80730             osd.20                          up  1.00000 1.00000
 23   hdd   5.80730             osd.23                          up  1.00000 1.00000
 26   hdd   5.80730             osd.26                          up  1.00000 1.00000
 32   hdd   5.80730             osd.32                          up  1.00000 1.00000
 35   hdd   5.80730             osd.35                          up  1.00000 1.00000
 38   hdd   5.80730             osd.38                          up  1.00000 1.00000
 -7        58.07297         host osp-hdd03
  2   hdd   5.80730             osd.2                           up  1.00000 1.00000
  8   hdd   5.80730             osd.8                           up  1.00000 1.00000
 10   hdd   5.80730             osd.10                          up  1.00000 1.00000
 13   hdd   5.80730             osd.13                          up  1.00000 1.00000
 18   hdd   5.80730             osd.18                          up  1.00000 1.00000
 21   hdd   5.80730             osd.21                          up  1.00000 1.00000
 24   hdd   5.80730             osd.24                          up  1.00000 1.00000
 27   hdd   5.80730             osd.27                          up  1.00000 1.00000
 33   hdd   5.80730             osd.33                          up  1.00000 1.00000
 36   hdd   5.80730             osd.36                          up  1.00000 1.00000
-46       394.72339 root SSD_ROOT
-29       394.72339     rack SSD_RACK2
-19       131.57446         host osp-ssd01
  3   ssd   7.30969             osd.3                           up  1.00000 1.00000
 15   ssd   7.30969             osd.15                          up  1.00000 1.00000
 29   ssd   7.30969             osd.29                          up  1.00000 1.00000
 39   ssd   7.30969             osd.39                          up  1.00000 1.00000
 42   ssd   7.30969             osd.42                          up  1.00000 1.00000
 45   ssd   7.30969             osd.45                          up  1.00000 1.00000
 48   ssd   7.30969             osd.48                          up  1.00000 1.00000
 51   ssd   7.30969             osd.51                          up  1.00000 1.00000
 54   ssd   7.30969             osd.54                          up  1.00000 1.00000
 57   ssd   7.30969             osd.57                          up  1.00000 1.00000
 60   ssd   7.30969             osd.60                          up  1.00000 1.00000
 63   ssd   7.30969             osd.63                          up  1.00000 1.00000
 66   ssd   7.30969             osd.66                          up  1.00000 1.00000
 69   ssd   7.30969             osd.69                          up  1.00000 1.00000
 72   ssd   7.30969             osd.72                          up  1.00000 1.00000
 75   ssd   7.30969             osd.75                          up  1.00000 1.00000
 78   ssd   7.30969             osd.78                          up  1.00000 1.00000
 81   ssd   7.30969             osd.81                          up  1.00000 1.00000
-16       131.57446         host osp-ssd02
  4   ssd   7.30969             osd.4                           up  1.00000 1.00000
 17   ssd   7.30969             osd.17                          up  1.00000 1.00000
 31   ssd   7.30969             osd.31                          up  1.00000 1.00000
 41   ssd   7.30969             osd.41                          up  1.00000 1.00000
 44   ssd   7.30969             osd.44                          up  1.00000 1.00000
 47   ssd   7.30969             osd.47                          up  1.00000 1.00000
 50   ssd   7.30969             osd.50                          up  1.00000 1.00000
 53   ssd   7.30969             osd.53                          up  1.00000 1.00000
 56   ssd   7.30969             osd.56                          up  1.00000 1.00000
 59   ssd   7.30969             osd.59                          up  1.00000 1.00000
 62   ssd   7.30969             osd.62                          up  1.00000 1.00000
 65   ssd   7.30969             osd.65                          up  1.00000 1.00000
 68   ssd   7.30969             osd.68                          up  1.00000 1.00000
 71   ssd   7.30969             osd.71                          up  1.00000 1.00000
 74   ssd   7.30969             osd.74                          up  1.00000 1.00000
 77   ssd   7.30969             osd.77                          up  1.00000 1.00000
 80   ssd   7.30969             osd.80                          up  1.00000 1.00000
 83   ssd   7.30969             osd.83                          up  1.00000 1.00000
-13       131.57446         host osp-ssd03
  5   ssd   7.30969             osd.5                           up  1.00000 1.00000
 16   ssd   7.30969             osd.16                          up  1.00000 1.00000
 30   ssd   7.30969             osd.30                          up  1.00000 1.00000
 40   ssd   7.30969             osd.40                          up  1.00000 1.00000
 43   ssd   7.30969             osd.43                          up  1.00000 1.00000
 46   ssd   7.30969             osd.46                          up  1.00000 1.00000
 49   ssd   7.30969             osd.49                          up  1.00000 1.00000
 52   ssd   7.30969             osd.52                          up  1.00000 1.00000
 55   ssd   7.30969             osd.55                          up  1.00000 1.00000
 58   ssd   7.30969             osd.58                          up  1.00000 1.00000
 61   ssd   7.30969             osd.61                          up  1.00000 1.00000
 64   ssd   7.30969             osd.64                          up  1.00000 1.00000
 67   ssd   7.30969             osd.67                          up  1.00000 1.00000
 70   ssd   7.30969             osd.70                          up  1.00000 1.00000
 73   ssd   7.30969             osd.73                          up  1.00000 1.00000
 76   ssd   7.30969             osd.76                          up  1.00000 1.00000
 79   ssd   7.30969             osd.79                          up  1.00000 1.00000
 82   ssd   7.30969             osd.82                          up  1.00000 1.00000
 -1               0 root default
[root@osp-ctrl01 /]#

All of my pool have replica size set to 3. maybe the crush rule cannot allocate 3 replication for the pool because I just have one rack available. I think it cause my ceph-rgw get into issue, as well as the ceph dashboard ac-user-show <user> or ceph dashboard ac-user-create <option> get into stuck.

After troubleshooting to bring back all the pgs to active+clean:

[root@osp-ctrl01 /]# ceph status
  cluster:
    id:    8a141276-7314-44bc-824f-***************
    health: HEALTH_WARN
            13 pool(s) have no replicas configured
            1 pools have too many placement groups
            mons are allowing insecure global_id reclaim

  services:
    mon: 3 daemons, quorum osp-ctrl01,osp-ctrl02,osp-ctrl03 (age 26h)
    mgr: osp-ctrl03(active, since 102m), standbys: osp-ctrl01, osp-ctrl02
    mds: cephfs:1 {0=osp-ctrl02=up:active} 2 up:standby
    osd: 84 osds: 84 up (since 3d), 84 in (since 3d)
    rgw: 3 daemons active (osp-ctrl01.rgw0, osp-ctrl02.rgw0, osp-ctrl03.rgw0)

  data:
    pools:   13 pools, 832 pgs
    objects: 226 objects, 11 KiB
    usage:   28 TiB used, 541 TiB / 569 TiB avail
    pgs:     832 active+clean

[root@osp-ctrl01 /]#

The command is successfully executed (the error is expected because user doesn't exist):

[root@osp-ctrl01 /]# ceph dashboard ac-user-show admin
Error ENOENT: User 'admin' does not exist
[root@osp-ctrl01 /]#
[root@osp-ctrl01 /]# ceph dashboard ac-user-show fathur-in-the-world
Error ENOENT: User 'fathur-in-the-world' does not exist
[root@osp-ctrl01 /]#

Maybe we have different root cause issue, but that's on my case. Hope can help someone out there.