ceph / ceph-ansible

Ansible playbooks to deploy Ceph, the distributed filesystem.
Apache License 2.0
1.66k stars 1.01k forks source link

Unable to find a keyring #3443

Closed SIM0N-F closed 5 years ago

SIM0N-F commented 5 years ago

Bug Report

What happened:

When I deploy cluster with last version of ceph-ansible (git clone this morning) the playbook dont create /etc/ceph/ceph.client.admin.keyring or any other key.

The playbook stop in create ceph mgr keyring(s) :

failed: [X.X.X.X] (item=X.X.X.X) => {
    "changed": true,
    "cmd": [
        "ceph",
        "-n",
        "client.admin",
        "-k",
        "/etc/ceph/ceph.client.admin.keyring",
        "--cluster",
        "ceph",
        "auth",
        "import",
        "-i",
        "/etc/ceph//ceph.mgr.ceph-mutu-3.keyring"
    ],
    "delta": "0:00:00.619335",
    "end": "2018-12-12 17:04:40.165201",
    "invocation": {
        "module_args": {
            "attributes": null,
            "backup": null,
            "caps": {
                "mds": "allow *",
                "mon": "allow profile mgr",
                "osd": "allow *"
            },
            "cluster": "ceph",
            "content": null,
            "delimiter": null,
            "dest": "/etc/ceph/",
            "directory_mode": null,
            "follow": false,
            "force": null,
            "group": "ceph",
            "import_key": true,
            "mode": "0400",
            "name": "mgr.ceph-mutu-3",
            "owner": "ceph",
            "regexp": null,
            "remote_src": null,
            "secret": null,
            "selevel": null,
            "serole": null,
            "setype": null,
            "seuser": null,
            "src": null,
            "state": "present",
            "unsafe_writes": null
        }
    },
    "item": "X.X.X.X",
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2018-12-12 17:04:39.545866",
    "stderr": "2018-12-12 17:04:40.138 7f2f8b96d700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring: (2) No such file or directory\n2018-12-12 17:04:40.140 7f2f8b96d700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication\n[errno 95] error connecting to the cluster",
    "stderr_lines": [
        "2018-12-12 17:04:40.138 7f2f8b96d700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring: (2) No such file or directory",
        "2018-12-12 17:04:40.140 7f2f8b96d700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication",
        "[errno 95] error connecting to the cluster"
    ],
    "stdout": "",
    "stdout_lines": []
}

How to reproduce it (minimal and precise):

Just launch playbook with my group_vars/all.yml

I use roles:

Share your group_vars files, inventory

Environment:

My ./group_var/all.yml

---

dummy:

###########
# GENERAL #
###########
vm_min_free_kbytes: 4194303

######################################
# Releases name to number dictionary #
######################################
ceph_release_num:
  dumpling: 0.67
  emperor: 0.72
  firefly: 0.80
  giant: 0.87
  hammer: 0.94
  infernalis: 9
  jewel: 10
  kraken: 11
  luminous: 12
  mimic: 13

# Directory to fetch cluster fsid, keys etc...
fetch_directory: fetch/

# The 'cluster' variable determines the name of the cluster.
cluster: ceph

# Inventory host group variables
mon_group_name: mons
osd_group_name: osds
rgw_group_name: rgws
mds_group_name: mdss
nfs_group_name: nfss
restapi_group_name: restapis
rbdmirror_group_name: rbdmirrors
client_group_name: clients
iscsi_gw_group_name: iscsi-gws
mgr_group_name: mgrs

# no fw
check_firewall: False

############
# PACKAGES #
############

centos_package_dependencies:
  - python-pycurl
  - hdparm
  - epel-release
  - python-setuptools
  - libselinux-python

# Enable the ntp service by default to avoid clock skew on ceph nodes
ntp_service_enabled: true

# Set uid/gid to default '64045' for bootstrap directories.
bootstrap_dirs_owner: "64045"
bootstrap_dirs_group: "64045"

# This variable determines if ceph packages can be updated.
upgrade_ceph_packages: true

###########
# INSTALL #
###########
ceph_stable: true # backward compatibility with stable-2.2, will disappear in stable 3.1

# ORIGIN SOURCE
ceph_origin: "repository"
valid_ceph_origins: repository
ceph_repository:  community
valid_ceph_repository: community

# REPOSITORY: COMMUNITY VERSION
#
# Enabled when ceph_repository == 'community'
#
ceph_mirror: http://download.ceph.com
ceph_stable_key: https://download.ceph.com/keys/release.asc
ceph_stable_release: mimic

######################
# CEPH CONFIGURATION #
######################

## Ceph options
fsid: "{{ cluster_uuid.stdout }}"
generate_fsid: true

ceph_conf_key_directory: /etc/ceph
# Permissions for keyring files in /etc/ceph
ceph_keyring_permissions: '0600'
cephx: true

## Client options

rbd_cache: "true"
rbd_cache_writethrough_until_flush: "true"
rbd_concurrent_management_ops: 20

rbd_client_directories: true # this will create rbd_client_log_path and rbd_client_admin_socket_path directories with proper permissions

## Monitor options
#

monitor_interface: bond0.3538
monitor_address_block: X.X.X.X/26
ip_version: ipv4
mon_use_fqdn: false # if set to true, the MON name used will be the fqdn in the ceph.conf

## OSD options
#
journal_size: 5120 # OSD journal size in MB
public_network: X.X.X.X/26
cluster_network: X.X.X.X/26
osd_objectstore: bluestore

## Rados Gateway options
#
radosgw_dns_name: rgw.xxx.fr
radosgw_resolve_cname: false 
radosgw_civetweb_port: 8080
radosgw_civetweb_num_threads: 100
radosgw_interface: bond0.3538
radosgw_keystone: false
email_address: sysadmin@xxx.fr

## REST API options
#
restapi_interface: "{{ monitor_interface }}"
restapi_address: "{{ monitor_address }}"
restapi_port: 5000

# Monitor handler checks
handler_health_mon_check_retries: 5
handler_health_mon_check_delay: 10
#
# OSD handler checks
handler_health_osd_check_retries: 40
handler_health_osd_check_delay: 30
handler_health_osd_check: true
#
# MDS handler checks
handler_health_mds_check_retries: 5
handler_health_mds_check_delay: 10
#
# RGW handler checks
handler_health_rgw_check_retries: 5
handler_health_rgw_check_delay: 10

# NFS handler checks
handler_health_nfs_check_retries: 5
handler_health_nfs_check_delay: 10

# RBD MIRROR handler checks
handler_health_rbd_mirror_check_retries: 5
handler_health_rbd_mirror_check_delay: 10

# MGR handler checks
handler_health_mgr_check_retries: 5
handler_health_mgr_check_delay: 10

###############
# NFS-GANESHA #
###############

# Set this to true to enable File access via NFS.  Requires an MDS role.
nfs_file_gw: true
# Set this to true to enable Object access via NFS. Requires an RGW role.
nfs_obj_gw: true

#############
# OS TUNING #
#############
disable_transparent_hugepage: true
os_tuning_params:
  - { name: kernel.pid_max, value: 4194303 }
  - { name: fs.file-max, value: 26234859 }
  - { name: vm.zone_reclaim_mode, value: 0 }
  - { name: vm.swappiness, value: 10 }
  - { name: vm.min_free_kbytes, value: "{{ vm_min_free_kbytes }}" }

# For Debian & Red Hat/CentOS installs set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
# Set this to a byte value (e.g. 134217728)
# A value of 0 will leave the package default.
ceph_tcmalloc_max_total_thread_cache: 0

#### ajouts spécifiques

### Ceph OSD
devices:
  - '/dev/sdb'
  - '/dev/sdc'
  - '/dev/sdd'
  - '/dev/sde'
  - '/dev/sdf'
  - '/dev/sdg'
  - '/dev/sdh'
  - '/dev/sdi'
  - '/dev/sdj'
  - '/dev/sdk'
osd_scenario: collocated

### CephFS
cephfs: cephfs # name of the ceph filesystem
cephfs_data: cephfs_data # name of the data pool for a given filesystem
cephfs_metadata: cephfs_metadata # name of the metadata pool for a given filesystem

cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "64" }
  - { name: "{{ cephfs_metadata }}", pgs: "16" }

### Ceph management
ceph_mgr_modules: [status,dashboard]

So could you help me please. Manny thanks by advance !!

guits commented 5 years ago

@SIM0N-F the master branch of ceph-ansible is intended to deploy ceph@master

SIM0N-F commented 5 years ago

Thanks for your quickly answer, so i have try with ceph-ansible-3.2.0 but i have new problem:

TASK [ceph-validate : validate provided configuration] *********************************************
task path: /home/hexanet/ansible/ceph-ansible-3.2.0/roles/ceph-validate/tasks/main.yml:2
Thursday 13 December 2018  09:36:48 +0100 (0:00:00.218)       0:00:19.179 *****
The full traceback is:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 139, in run
    res = self._execute()
  File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 584, in _execute
    result = self._handler.run(task_vars=variables)
  File "/home/hexanet/ansible/ceph-ansible-3.2.0/plugins/actions/validate.py", line 100, in run
    msg = "[{}] Validation failed for variable: {}".format(host, error.path[0])
IndexError: list index out of range

fatal: [100.127.2.2]: FAILED! => {
    "msg": "Unexpected failure during module execution.",
    "stdout": ""
}

I must open other issue for it ?

guits commented 5 years ago

@SIM0N-F stable-3.2 supports ansible 2.6

SIM0N-F commented 5 years ago

Yes I now, havbe downgrade to ansible 2.6.10

guits commented 5 years ago

@SIM0N-F are you saying that you are still facing the same issue even with ansible 2.6 ?

SIM0N-F commented 5 years ago

Yes i have installed 2.6 before try ceph-ansible 3.2.0

guits commented 5 years ago

please, share the full playbook log in that case

SIM0N-F commented 5 years ago

Ok,here it is: ansible_without_ip_name.log

guits commented 5 years ago

@SIM0N-F which version of stable-3.2 are you using?

SIM0N-F commented 5 years ago

https://github.com/ceph/ceph-ansible/releases/tag/v3.2.0

guits commented 5 years ago

@SIM0N-F try the latest stable-3.2 version: v3.2.0rc8

wwyhy commented 5 years ago

Thanks for your quickly answer, so i have try with ceph-ansible-3.2.0 but i have new problem:

TASK [ceph-validate : validate provided configuration] *********************************************
task path: /home/hexanet/ansible/ceph-ansible-3.2.0/roles/ceph-validate/tasks/main.yml:2
Thursday 13 December 2018  09:36:48 +0100 (0:00:00.218)       0:00:19.179 *****
The full traceback is:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 139, in run
    res = self._execute()
  File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 584, in _execute
    result = self._handler.run(task_vars=variables)
  File "/home/hexanet/ansible/ceph-ansible-3.2.0/plugins/actions/validate.py", line 100, in run
    msg = "[{}] Validation failed for variable: {}".format(host, error.path[0])
IndexError: list index out of range

fatal: [100.127.2.2]: FAILED! => {
    "msg": "Unexpected failure during module execution.",
    "stdout": ""
}

I must open other issue for it ?

I met same issue with ansbiel 2.7.4 and ceph-ansible 3.2.0. any solutions? thanks!

SIM0N-F commented 5 years ago

In 3.2rc8 same issue. New log file: ansible-3.2RC8_without_ip_name.log

SIM0N-F commented 5 years ago

@wwyhy ceph-ansible 3.2 dont support Ansible 2.7.X (2.6 is needed)

wwyhy commented 5 years ago

@SIM0N-F

I downgraded the ansible to 2.6.10, got same error.

wwyhy commented 5 years ago

I am also tried ceph-ansible-3.2.0rc8, same error.

wwyhy commented 5 years ago

@SIM0N-F

I just found this issue is caused by variable "osd_scenario" in file ceph-ansible-3.2.0/group_vars/osds.yml, this variable need to be set as one of follow. then it works fine.

#osd_scenario: lvm
#valid_osd_scenarios:
#  - collocated
#  - non-collocated
#  - lvm
SIM0N-F commented 5 years ago

@wwyhy I have commented the osb scenario line but this no works more.

SIM0N-F commented 5 years ago

I wonder if this problem is due to my Natorio version?

I use 0.0.16:

 pip show notario
Name: notario
Version: 0.0.16
Summary: A dictionary validator
Home-page: http://github.com/alfredodeza/notario
Author: Alfredo Deza
Author-email: UNKNOWN
License: MIT
Location: /usr/local/lib/python2.7/dist-packages
Requires:
Required-by:
SIM0N-F commented 5 years ago

I have try with 0.0.14, same issue.

guits commented 5 years ago

I think stable-3.2 is missing a backport.

guits commented 5 years ago

@SIM0N-F do you mind testing with this branch : guits_missing_backports ?

Thanks!

guits commented 5 years ago

@rishabh-d-dave I think you've worked on a patch on master branch for this error. Do you mind taking a look at this in case where the missing backport that i've just mentioned in previous comment isn't enough to fix this issue? Thanks!

wwyhy commented 5 years ago

@SIM0N-F

Sorry for misunderstanding, I set the "osd_scenario: lvm". I get ride of the error.

But met new error. seems like yum repo configuration issue. I will check it tomorrow.

guits commented 5 years ago

@wwyhy osd_scenario: collocated is still a valid scenario in stable-3.2

SIM0N-F commented 5 years ago

I have try with guits_missing_backports but unfortunately, same issue, again.

$ git branch
* guits_missing_backports
TASK [ceph-validate : validate provided configuration] **************************************************************************************************************
task path: /home/xxx/ansible/ceph-ansible/roles/ceph-validate/tasks/main.yml:2
Thursday 13 December 2018  14:23:40 +0100 (0:00:00.241)       0:00:14.827 *****
The full traceback is:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 139, in run
    res = self._execute()
  File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 584, in _execute
    result = self._handler.run(task_vars=variables)
  File "/home/xxx/ansible/ceph-ansible/plugins/actions/validate.py", line 48, in run
    notario_store["containerized_deployment"] = host_vars["containerized_deployment"]
KeyError: 'containerized_deployment'

fatal: [X.X.X.X]: FAILED! => {
    "msg": "Unexpected failure during module execution.",
    "stdout": ""
}
rishabh-d-dave commented 5 years ago

I cherry-picked this commit to v3.2.0 and tried running a ceph-ansible scenario with group_vars/all.yml posted in the bug report. I think the error message is now much more clearer -

TASK [ceph-validate : fail if bond0.3538 does not exist on ceph-rgw0] **********
task path: /home/rishabh/run/ceph-ansible/issue-3443/roles/ceph-validate/tasks/check_eth_rgw.yml:2
Thursday 13 December 2018  19:04:19 +0530 (0:00:00.486)       0:00:36.091 ***** 
META: noop
META: noop
META: noop
META: noop
META: noop
fatal: [ceph-rgw0]: FAILED! => {
    "changed": false
}

MSG:

bond0.3538 does not exist on ceph-rgw0

@SIM0N-F @wwyhy @guits If this resolves the issue, I'll backport the commit.

SIM0N-F commented 5 years ago

Yes @rishabh-d-dave manny thanks, your commit helped me a lot to understand my problems, i had forgotten some variables like: is_hci: False configure_firewall: False ....

So your commit has been very userfull for me.

But now I have new issue in my deployement:

<X.X.X.X> (1, '\n{"changed": true, "end": "2018-12-13 22:25:08.646332", "stdout": "", "cmd": ["ceph-disk", "activate", "/dev/sdk1"], "failed": true, "delta": "0:00:00.348117", "stderr": "/usr/lib/python2.7/site-packages/ceph_disk/main.py:5689: UserWarning: \\n*******************************************************************************\\nThis tool is now deprecated in favor of ceph-volume.\\nIt is recommended to use ceph-volume for OSD deployments. For details see:\\n\\n    http://docs.ceph.com/docs/master/ceph-volume/#migrating\\n\\n*******************************************************************************\\n\\n  warnings.warn(DEPRECATION_WARNING)\\nmount_activate: Failed to activate\\n/usr/lib/python2.7/site-packages/ceph_disk/main.py:5750: UserWarning: \\n*******************************************************************************\\nThis tool is now deprecated in favor of ceph-volume.\\nIt is recommended to use ceph-volume for OSD deployments. For details see:\\n\\n    http://docs.ceph.com/docs/master/ceph-volume/#migrating\\n\\n*******************************************************************************\\n\\n  warnings.warn(DEPRECATION_WARNING)\\nceph-disk: Error: No cluster conf found in /etc/ceph with fsid b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3", "rc": 1, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": false, "_raw_params": "ceph-disk activate \\"/dev/sdk1\\"", "removes": null, "argv": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-12-13 22:25:08.298215", "msg": "non-zero return code"}\n', '')
failed: [X.X.X.X] (item=/dev/sdk) => {
    "changed": false,
    "cmd": [
        "ceph-disk",
        "activate",
        "/dev/sdk1"
    ],
    "delta": "0:00:00.348117",
    "end": "2018-12-13 22:25:08.646332",
    "invocation": {
        "module_args": {
            "_raw_params": "ceph-disk activate \"/dev/sdk1\"",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "item": "/dev/sdk",
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2018-12-13 22:25:08.298215",
    "stderr": "/usr/lib/python2.7/site-packages/ceph_disk/main.py:5689: UserWarning: \n*******************************************************************************\nThis tool is now deprecated in favor of ceph-volume.\nIt is recommended to use ceph-volume for OSD deployments. For details see:\n\n    http://docs.ceph.com/docs/master/ceph-volume/#migrating\n\n*******************************************************************************\n\n  warnings.warn(DEPRECATION_WARNING)\nmount_activate: Failed to activate\n/usr/lib/python2.7/site-packages/ceph_disk/main.py:5750: UserWarning: \n*******************************************************************************\nThis tool is now deprecated in favor of ceph-volume.\nIt is recommended to use ceph-volume for OSD deployments. For details see:\n\n    http://docs.ceph.com/docs/master/ceph-volume/#migrating\n\n*******************************************************************************\n\n  warnings.warn(DEPRECATION_WARNING)\nceph-disk: Error: No cluster conf found in /etc/ceph with fsid b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3",
    "stderr_lines": [
        "/usr/lib/python2.7/site-packages/ceph_disk/main.py:5689: UserWarning: ",
        "*******************************************************************************",
        "This tool is now deprecated in favor of ceph-volume.",
        "It is recommended to use ceph-volume for OSD deployments. For details see:",
        "",
        "    http://docs.ceph.com/docs/master/ceph-volume/#migrating",
        "",
        "*******************************************************************************",
        "",
        "  warnings.warn(DEPRECATION_WARNING)",
        "mount_activate: Failed to activate",
        "/usr/lib/python2.7/site-packages/ceph_disk/main.py:5750: UserWarning: ",
        "*******************************************************************************",
        "This tool is now deprecated in favor of ceph-volume.",
        "It is recommended to use ceph-volume for OSD deployments. For details see:",
        "",
        "    http://docs.ceph.com/docs/master/ceph-volume/#migrating",
        "",
        "*******************************************************************************",
        "",
        "  warnings.warn(DEPRECATION_WARNING)",
        "ceph-disk: Error: No cluster conf found in /etc/ceph with fsid b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3"
    ],
    "stdout": "",
    "stdout_lines": []
}

And when i want launch the commande (# ceph-disk activate /dev/sdk1) manually in my ceph node I have same problem:

ceph-disk: Error: No cluster conf found in /etc/ceph with fsid b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3

Could you help me again please?

wwyhy commented 5 years ago

another error:


Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<mon1> ESTABLISH SSH CONNECTION FOR USER: None
<mon1> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=60 -o ControlPath=/root/.ansible/cp/%h-%r-%p mon1 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-bobftbxxxieqgpcsbblwugxbydnrkhum; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Escalation succeeded
<mon1> (1, '\n{"changed": true, "end": "2018-12-12 23:32:52.758230", "stdout": "", "cmd": ["ceph-create-keys", "--cluster", "ceph", "-i", "mon1", "-t", "30"], "failed": true, "delta": "0:00:00.051626", "stderr": "usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID\\nceph-create-keys: error: unrecognized arguments: -t 30", "rc": 2, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": false, "_raw_params": "ceph-create-keys --cluster ceph -i mon1 -t 30", "removes": null, "argv": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-12-12 23:32:52.706604", "msg": "non-zero return code"}\n', '')
fatal: [mon1]: FAILED! => {
    "changed": false,
    "cmd": [
        "ceph-create-keys",
        "--cluster",
        "ceph",
        "-i",
        "mon1",
        "-t",
        "30"
    ],
    "delta": "0:00:00.051626",
    "end": "2018-12-12 23:32:52.758230",
    "invocation": {
        "module_args": {
            "_raw_params": "ceph-create-keys --cluster ceph -i mon1 -t 30",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 2,
    "start": "2018-12-12 23:32:52.706604",
    "stderr": "usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID\nceph-create-keys: error: unrecognized arguments: -t 30",
    "stderr_lines": [
        "usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID",
        "ceph-create-keys: error: unrecognized arguments: -t 30"
    ],
    "stdout": "",
    "stdout_lines": []
}

ceph-create-keys: error: unrecognized arguments: -t 30, invalid parameters of -t.

[root@mon1 ceph-ansible-3.2.0]# ceph-create-keys --help
usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID

Create Ceph client.admin key when ceph-mon is ready

optional arguments:
  -h, --help      show this help message and exit
  -v, --verbose   be more verbose
  --cluster NAME  name of the cluster
  --id ID, -i ID  id of a ceph-mon that is coming up
[root@mon1 ceph-ansible-3.2.0]# ceph-create-keys --cluster ceph -i mon1 -t 30
usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID
ceph-create-keys: error: unrecognized arguments: -t 30
rishabh-d-dave commented 5 years ago

@SIM0N-F wrote:

Yes @rishabh-d-dave manny thanks, your commit helped me a lot to understand my problems, i had > forgotten some variables like: is_hci: False configure_firewall: False ....

So your commit has been very userfull for me.

Cool. I'll proceed for the backporting the commit.

@SIM0N-F @wwyhy Are both of you still using 3.2.0? ceph-disk cannot be used with master since ceph-disk is deprecated for master version of Ceph (see docs). And can you provide me with your new group_vars/all.yml, so I can try reproducing (and debugging/fixing) it on my machine?

SIM0N-F commented 5 years ago

Hello @rishabh-d-dave I try with ceph-ansible-3.2.0 stable release.

I use this all.yml.

all_without_ip_name.yml.txt

wwyhy commented 5 years ago

Finnaly, I get it works now.

PLAY RECAP ***********************************************************************************************************************************************************************************
admin                      : ok=237  changed=14   unreachable=0    failed=0
node0                      : ok=157  changed=7    unreachable=0    failed=0
node1                      : ok=159  changed=10   unreachable=0    failed=0

INSTALLER STATUS *****************************************************************************************************************************************************************************
Install Ceph Monitor        : Complete (0:01:09)
Install Ceph Manager        : Complete (0:00:33)
Install Ceph OSD            : Complete (0:01:00)

Friday 14 December 2018  03:48:29 -0500 (0:00:00.125)       0:03:13.427 *******
[root@admin ceph-ansible-3.2.0]# ceph -s
  cluster:
    id:     08939cdc-2167-4a7f-8e09-45b9460f6384
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum admin,node0,node1
    mgr: admin(active)
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   3.0 GiB used, 117 GiB / 120 GiB avail
    pgs:
rishabh-d-dave commented 5 years ago

@wwyhy Great. What changes did you make? Did you choose the right version?

wwyhy commented 5 years ago

I deleted the parameters " -t 30" from the yml under roles.

rishabh-d-dave commented 5 years ago

@wwyhy Can please you create an issue for what happened and what fixed it? Getting that issue out of the way might help others. :)

SIM0N-F commented 5 years ago

Hi all,

I have new issue and I dont see the problem, can you help me ?

TASK [ceph-mds : customize pool size] *******************************************************************************************************************************
task path: /home/hexanet/ansible/ceph-ansible-3.2.0/roles/ceph-mds/tasks/create_mds_filesystems.yml:11
Saturday 15 December 2018  17:50:07 +0100 (0:00:02.322)       0:05:38.025 *****
META: noop
META: noop
fatal: [X.X.X.X]: FAILED! => {
    "msg": "The conditional check 'item.size | default(osd_pool_default_size) != ceph_osd_pool_default_size' failed. The error was: error while evaluating conditional (item.size | default(osd_pool_default_size) != ceph_osd_pool_default_size): 'ceph_osd_pool_default_size' is undefined\n\nThe error appears to have been in '/home/hexanet/ansible/ceph-ansible-3.2.0/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 11, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: customize pool size\n      ^ here\n"
}
rishabh-d-dave commented 5 years ago

@SIM0N-F I tried reproducing previous issue 3 - 4 times. I took a fresh copy of ceph-ansible's repo, checked out stable-3.2, backported the commit and used the scenario docker_cluster and centos7_cluster to setup the Ceph cluster; both of them completed running successfully (i.e. I couldn't reproduce the issue). Can you be more specific on how are you using ceph-ansible to setup Ceph cluster (by running some scenario in tox, by using a modified site.yml.sample and/or site-docker.yml.sample, etc)?

guits commented 5 years ago

@SIM0N-F if the original issue is fixed, please close this one and open a new one.

SIM0N-F commented 5 years ago

Hello,

Thanks for answer,

I have added this option to correct this problem:

osd_pool_default_size: 3
ceph_osd_pool_default_size: 3

I use playbook with modified site.yml and this inventory file:

[mons]
xxxx monitor_address=xxxx
xxxx monitor_address=xxxx
xxxx monitor_address=xxxx

[osds]
xxxx
xxxx
xxxx

#[agents]
#10.84.21.13
#10.84.21.14
#10.84.21.15

[mdss]
100.127.2.2
100.127.2.3
100.127.2.4

[rgws]
xxxx

[restapis]
xxxx
xxxx
xxxx

[mgrs]
xxxx
xxxx
xxxx

site.yml.txt

Now the deployment result seems successfully:

PLAY RECAP *****************************************************************************************************
100.127.2.2                : ok=382  changed=17   unreachable=0    failed=0
100.127.2.3                : ok=272  changed=15   unreachable=0    failed=0
100.127.2.4                : ok=275  changed=15   unreachable=0    failed=0

INSTALLER STATUS ***********************************************************************************************
Install Ceph Monitor        : Complete (0:01:39)
Install Ceph Manager        : Complete (0:01:14)
Install Ceph OSD            : Complete (0:04:33)
Install Ceph MDS            : Complete (0:01:18)
Install Ceph RGW            : Complete (0:00:48)

But when I test ceph -s

cluster:
    id:     b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3
    health: HEALTH_WARN
            1 MDSs report slow metadata IOs
            1 filesystem is online with fewer MDS than max_mds
            Reduced data availability: 88 pgs inactive

  services:
    mon: 3 daemons, quorum ceph-mutu-1,ceph-mutu-2,ceph-mutu-3
    mgr: ceph-mutu-1(active), standbys: ceph-mutu-2, ceph-mutu-3
    mds: cephfs-1/1/3 up  {0=ceph-mutu-1=up:creating}, 2 up:standby
    osd: 2 osds: 0 up, 0 in

  data:
    pools:   3 pools, 88 pgs
    objects: 0  objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             88 unknown

And when I check mounted disk:

/dev/sde1 on /var/lib/ceph/osd/ceph-11 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdi1 on /var/lib/ceph/osd/ceph-23 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdh1 on /var/lib/ceph/osd/ceph-20 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdf1 on /var/lib/ceph/osd/ceph-14 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdk1 on /var/lib/ceph/osd/ceph-29 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdc1 on /var/lib/ceph/osd/ceph-5 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdg1 on /var/lib/ceph/osd/ceph-17 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdj1 on /var/lib/ceph/osd/ceph-25 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)

But:

# fdisk -l /dev/sdk
Attention : la prise en charge de GPT dans fdisk est récente, et par conséquent en phase expérimentale. Utilisez-la avec précaution.

Disque /dev/sdk : 1999.8 Go, 1999844147200 octets, 3905945600 secteurs
Unités = secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 512 octets
taille d'E/S (minimale / optimale) : 512 octets / 512 octets
Type d'étiquette de disque : gpt
Disk identifier: 02FF0C56-787A-451A-A307-1A3900A3D0C5

#         Start          End    Size  Type            Name
 1         2048       206847    100M  Ceph OSD        ceph data
 2       206848   3905945566    1,8T  inconnu         ceph block
guits commented 5 years ago

@SIM0N-F please, open a new issue for this.

SIM0N-F commented 5 years ago

ok sorry @guits I have open new issue: https://github.com/ceph/ceph-ansible/issues/3450

guits commented 5 years ago

@SIM0N-F thanks, no worries, it's just better to not confuse people about the status of the current issue.

guits commented 5 years ago

@SIM0N-F @rishabh-d-dave what's the status of this issue ?

rishabh-d-dave commented 5 years ago

@SIM0N-F @guits Since the previous issue reported here is identical to the one on new issue report, I think we can close this issue.

SIM0N-F commented 5 years ago

Yes, I close this issue.