Closed SIM0N-F closed 5 years ago
@SIM0N-F the master branch of ceph-ansible is intended to deploy ceph@master
Thanks for your quickly answer, so i have try with ceph-ansible-3.2.0 but i have new problem:
TASK [ceph-validate : validate provided configuration] *********************************************
task path: /home/hexanet/ansible/ceph-ansible-3.2.0/roles/ceph-validate/tasks/main.yml:2
Thursday 13 December 2018 09:36:48 +0100 (0:00:00.218) 0:00:19.179 *****
The full traceback is:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 139, in run
res = self._execute()
File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 584, in _execute
result = self._handler.run(task_vars=variables)
File "/home/hexanet/ansible/ceph-ansible-3.2.0/plugins/actions/validate.py", line 100, in run
msg = "[{}] Validation failed for variable: {}".format(host, error.path[0])
IndexError: list index out of range
fatal: [100.127.2.2]: FAILED! => {
"msg": "Unexpected failure during module execution.",
"stdout": ""
}
I must open other issue for it ?
@SIM0N-F stable-3.2 supports ansible 2.6
Yes I now, havbe downgrade to ansible 2.6.10
@SIM0N-F are you saying that you are still facing the same issue even with ansible 2.6 ?
Yes i have installed 2.6 before try ceph-ansible 3.2.0
please, share the full playbook log in that case
Ok,here it is: ansible_without_ip_name.log
@SIM0N-F which version of stable-3.2 are you using?
@SIM0N-F try the latest stable-3.2 version: v3.2.0rc8
Thanks for your quickly answer, so i have try with ceph-ansible-3.2.0 but i have new problem:
TASK [ceph-validate : validate provided configuration] ********************************************* task path: /home/hexanet/ansible/ceph-ansible-3.2.0/roles/ceph-validate/tasks/main.yml:2 Thursday 13 December 2018 09:36:48 +0100 (0:00:00.218) 0:00:19.179 ***** The full traceback is: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 139, in run res = self._execute() File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 584, in _execute result = self._handler.run(task_vars=variables) File "/home/hexanet/ansible/ceph-ansible-3.2.0/plugins/actions/validate.py", line 100, in run msg = "[{}] Validation failed for variable: {}".format(host, error.path[0]) IndexError: list index out of range fatal: [100.127.2.2]: FAILED! => { "msg": "Unexpected failure during module execution.", "stdout": "" }
I must open other issue for it ?
I met same issue with ansbiel 2.7.4 and ceph-ansible 3.2.0. any solutions? thanks!
In 3.2rc8 same issue. New log file: ansible-3.2RC8_without_ip_name.log
@wwyhy ceph-ansible 3.2 dont support Ansible 2.7.X (2.6 is needed)
@SIM0N-F
I downgraded the ansible to 2.6.10, got same error.
I am also tried ceph-ansible-3.2.0rc8, same error.
@SIM0N-F
I just found this issue is caused by variable "osd_scenario" in file ceph-ansible-3.2.0/group_vars/osds.yml, this variable need to be set as one of follow. then it works fine.
#osd_scenario: lvm
#valid_osd_scenarios:
# - collocated
# - non-collocated
# - lvm
@wwyhy I have commented the osb scenario line but this no works more.
I wonder if this problem is due to my Natorio version?
I use 0.0.16:
pip show notario
Name: notario
Version: 0.0.16
Summary: A dictionary validator
Home-page: http://github.com/alfredodeza/notario
Author: Alfredo Deza
Author-email: UNKNOWN
License: MIT
Location: /usr/local/lib/python2.7/dist-packages
Requires:
Required-by:
I have try with 0.0.14, same issue.
I think stable-3.2 is missing a backport.
@SIM0N-F do you mind testing with this branch : guits_missing_backports
?
Thanks!
@rishabh-d-dave I think you've worked on a patch on master
branch for this error.
Do you mind taking a look at this in case where the missing backport that i've just mentioned in previous comment isn't enough to fix this issue? Thanks!
@SIM0N-F
Sorry for misunderstanding, I set the "osd_scenario: lvm". I get ride of the error.
But met new error. seems like yum repo configuration issue. I will check it tomorrow.
@wwyhy osd_scenario: collocated
is still a valid scenario in stable-3.2
I have try with guits_missing_backports but unfortunately, same issue, again.
$ git branch
* guits_missing_backports
TASK [ceph-validate : validate provided configuration] **************************************************************************************************************
task path: /home/xxx/ansible/ceph-ansible/roles/ceph-validate/tasks/main.yml:2
Thursday 13 December 2018 14:23:40 +0100 (0:00:00.241) 0:00:14.827 *****
The full traceback is:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 139, in run
res = self._execute()
File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 584, in _execute
result = self._handler.run(task_vars=variables)
File "/home/xxx/ansible/ceph-ansible/plugins/actions/validate.py", line 48, in run
notario_store["containerized_deployment"] = host_vars["containerized_deployment"]
KeyError: 'containerized_deployment'
fatal: [X.X.X.X]: FAILED! => {
"msg": "Unexpected failure during module execution.",
"stdout": ""
}
I cherry-picked this commit to v3.2.0 and tried running a ceph-ansible scenario with group_vars/all.yml
posted in the bug report. I think the error message is now much more clearer -
TASK [ceph-validate : fail if bond0.3538 does not exist on ceph-rgw0] **********
task path: /home/rishabh/run/ceph-ansible/issue-3443/roles/ceph-validate/tasks/check_eth_rgw.yml:2
Thursday 13 December 2018 19:04:19 +0530 (0:00:00.486) 0:00:36.091 *****
META: noop
META: noop
META: noop
META: noop
META: noop
fatal: [ceph-rgw0]: FAILED! => {
"changed": false
}
MSG:
bond0.3538 does not exist on ceph-rgw0
@SIM0N-F @wwyhy @guits If this resolves the issue, I'll backport the commit.
Yes @rishabh-d-dave manny thanks, your commit helped me a lot to understand my problems, i had forgotten some variables like: is_hci: False configure_firewall: False ....
So your commit has been very userfull for me.
But now I have new issue in my deployement:
<X.X.X.X> (1, '\n{"changed": true, "end": "2018-12-13 22:25:08.646332", "stdout": "", "cmd": ["ceph-disk", "activate", "/dev/sdk1"], "failed": true, "delta": "0:00:00.348117", "stderr": "/usr/lib/python2.7/site-packages/ceph_disk/main.py:5689: UserWarning: \\n*******************************************************************************\\nThis tool is now deprecated in favor of ceph-volume.\\nIt is recommended to use ceph-volume for OSD deployments. For details see:\\n\\n http://docs.ceph.com/docs/master/ceph-volume/#migrating\\n\\n*******************************************************************************\\n\\n warnings.warn(DEPRECATION_WARNING)\\nmount_activate: Failed to activate\\n/usr/lib/python2.7/site-packages/ceph_disk/main.py:5750: UserWarning: \\n*******************************************************************************\\nThis tool is now deprecated in favor of ceph-volume.\\nIt is recommended to use ceph-volume for OSD deployments. For details see:\\n\\n http://docs.ceph.com/docs/master/ceph-volume/#migrating\\n\\n*******************************************************************************\\n\\n warnings.warn(DEPRECATION_WARNING)\\nceph-disk: Error: No cluster conf found in /etc/ceph with fsid b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3", "rc": 1, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": false, "_raw_params": "ceph-disk activate \\"/dev/sdk1\\"", "removes": null, "argv": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-12-13 22:25:08.298215", "msg": "non-zero return code"}\n', '')
failed: [X.X.X.X] (item=/dev/sdk) => {
"changed": false,
"cmd": [
"ceph-disk",
"activate",
"/dev/sdk1"
],
"delta": "0:00:00.348117",
"end": "2018-12-13 22:25:08.646332",
"invocation": {
"module_args": {
"_raw_params": "ceph-disk activate \"/dev/sdk1\"",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": true
}
},
"item": "/dev/sdk",
"msg": "non-zero return code",
"rc": 1,
"start": "2018-12-13 22:25:08.298215",
"stderr": "/usr/lib/python2.7/site-packages/ceph_disk/main.py:5689: UserWarning: \n*******************************************************************************\nThis tool is now deprecated in favor of ceph-volume.\nIt is recommended to use ceph-volume for OSD deployments. For details see:\n\n http://docs.ceph.com/docs/master/ceph-volume/#migrating\n\n*******************************************************************************\n\n warnings.warn(DEPRECATION_WARNING)\nmount_activate: Failed to activate\n/usr/lib/python2.7/site-packages/ceph_disk/main.py:5750: UserWarning: \n*******************************************************************************\nThis tool is now deprecated in favor of ceph-volume.\nIt is recommended to use ceph-volume for OSD deployments. For details see:\n\n http://docs.ceph.com/docs/master/ceph-volume/#migrating\n\n*******************************************************************************\n\n warnings.warn(DEPRECATION_WARNING)\nceph-disk: Error: No cluster conf found in /etc/ceph with fsid b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3",
"stderr_lines": [
"/usr/lib/python2.7/site-packages/ceph_disk/main.py:5689: UserWarning: ",
"*******************************************************************************",
"This tool is now deprecated in favor of ceph-volume.",
"It is recommended to use ceph-volume for OSD deployments. For details see:",
"",
" http://docs.ceph.com/docs/master/ceph-volume/#migrating",
"",
"*******************************************************************************",
"",
" warnings.warn(DEPRECATION_WARNING)",
"mount_activate: Failed to activate",
"/usr/lib/python2.7/site-packages/ceph_disk/main.py:5750: UserWarning: ",
"*******************************************************************************",
"This tool is now deprecated in favor of ceph-volume.",
"It is recommended to use ceph-volume for OSD deployments. For details see:",
"",
" http://docs.ceph.com/docs/master/ceph-volume/#migrating",
"",
"*******************************************************************************",
"",
" warnings.warn(DEPRECATION_WARNING)",
"ceph-disk: Error: No cluster conf found in /etc/ceph with fsid b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3"
],
"stdout": "",
"stdout_lines": []
}
And when i want launch the commande (# ceph-disk activate /dev/sdk1) manually in my ceph node I have same problem:
ceph-disk: Error: No cluster conf found in /etc/ceph with fsid b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3
Could you help me again please?
another error:
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<mon1> ESTABLISH SSH CONNECTION FOR USER: None
<mon1> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=60 -o ControlPath=/root/.ansible/cp/%h-%r-%p mon1 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-bobftbxxxieqgpcsbblwugxbydnrkhum; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Escalation succeeded
<mon1> (1, '\n{"changed": true, "end": "2018-12-12 23:32:52.758230", "stdout": "", "cmd": ["ceph-create-keys", "--cluster", "ceph", "-i", "mon1", "-t", "30"], "failed": true, "delta": "0:00:00.051626", "stderr": "usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID\\nceph-create-keys: error: unrecognized arguments: -t 30", "rc": 2, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": false, "_raw_params": "ceph-create-keys --cluster ceph -i mon1 -t 30", "removes": null, "argv": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-12-12 23:32:52.706604", "msg": "non-zero return code"}\n', '')
fatal: [mon1]: FAILED! => {
"changed": false,
"cmd": [
"ceph-create-keys",
"--cluster",
"ceph",
"-i",
"mon1",
"-t",
"30"
],
"delta": "0:00:00.051626",
"end": "2018-12-12 23:32:52.758230",
"invocation": {
"module_args": {
"_raw_params": "ceph-create-keys --cluster ceph -i mon1 -t 30",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": true
}
},
"msg": "non-zero return code",
"rc": 2,
"start": "2018-12-12 23:32:52.706604",
"stderr": "usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID\nceph-create-keys: error: unrecognized arguments: -t 30",
"stderr_lines": [
"usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID",
"ceph-create-keys: error: unrecognized arguments: -t 30"
],
"stdout": "",
"stdout_lines": []
}
ceph-create-keys: error: unrecognized arguments: -t 30, invalid parameters of -t.
[root@mon1 ceph-ansible-3.2.0]# ceph-create-keys --help
usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID
Create Ceph client.admin key when ceph-mon is ready
optional arguments:
-h, --help show this help message and exit
-v, --verbose be more verbose
--cluster NAME name of the cluster
--id ID, -i ID id of a ceph-mon that is coming up
[root@mon1 ceph-ansible-3.2.0]# ceph-create-keys --cluster ceph -i mon1 -t 30
usage: ceph-create-keys [-h] [-v] [--cluster NAME] --id ID
ceph-create-keys: error: unrecognized arguments: -t 30
@SIM0N-F wrote:
Yes @rishabh-d-dave manny thanks, your commit helped me a lot to understand my problems, i had > forgotten some variables like: is_hci: False configure_firewall: False ....
So your commit has been very userfull for me.
Cool. I'll proceed for the backporting the commit.
@SIM0N-F @wwyhy Are both of you still using 3.2.0? ceph-disk
cannot be used with master since ceph-disk is deprecated for master version of Ceph (see docs). And can you provide me with your new group_vars/all.yml
, so I can try reproducing (and debugging/fixing) it on my machine?
Hello @rishabh-d-dave I try with ceph-ansible-3.2.0 stable release.
I use this all.yml.
Finnaly, I get it works now.
PLAY RECAP ***********************************************************************************************************************************************************************************
admin : ok=237 changed=14 unreachable=0 failed=0
node0 : ok=157 changed=7 unreachable=0 failed=0
node1 : ok=159 changed=10 unreachable=0 failed=0
INSTALLER STATUS *****************************************************************************************************************************************************************************
Install Ceph Monitor : Complete (0:01:09)
Install Ceph Manager : Complete (0:00:33)
Install Ceph OSD : Complete (0:01:00)
Friday 14 December 2018 03:48:29 -0500 (0:00:00.125) 0:03:13.427 *******
[root@admin ceph-ansible-3.2.0]# ceph -s
cluster:
id: 08939cdc-2167-4a7f-8e09-45b9460f6384
health: HEALTH_OK
services:
mon: 3 daemons, quorum admin,node0,node1
mgr: admin(active)
osd: 3 osds: 3 up, 3 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 117 GiB / 120 GiB avail
pgs:
@wwyhy Great. What changes did you make? Did you choose the right version?
I deleted the parameters " -t 30" from the yml under roles.
@wwyhy Can please you create an issue for what happened and what fixed it? Getting that issue out of the way might help others. :)
Hi all,
I have new issue and I dont see the problem, can you help me ?
TASK [ceph-mds : customize pool size] *******************************************************************************************************************************
task path: /home/hexanet/ansible/ceph-ansible-3.2.0/roles/ceph-mds/tasks/create_mds_filesystems.yml:11
Saturday 15 December 2018 17:50:07 +0100 (0:00:02.322) 0:05:38.025 *****
META: noop
META: noop
fatal: [X.X.X.X]: FAILED! => {
"msg": "The conditional check 'item.size | default(osd_pool_default_size) != ceph_osd_pool_default_size' failed. The error was: error while evaluating conditional (item.size | default(osd_pool_default_size) != ceph_osd_pool_default_size): 'ceph_osd_pool_default_size' is undefined\n\nThe error appears to have been in '/home/hexanet/ansible/ceph-ansible-3.2.0/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 11, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: customize pool size\n ^ here\n"
}
@SIM0N-F I tried reproducing previous issue 3 - 4 times. I took a fresh copy of ceph-ansible's repo, checked out stable-3.2, backported the commit and used the scenario docker_cluster
and centos7_cluster
to setup the Ceph cluster; both of them completed running successfully (i.e. I couldn't reproduce the issue). Can you be more specific on how are you using ceph-ansible to setup Ceph cluster (by running some scenario in tox, by using a modified site.yml.sample
and/or site-docker.yml.sample
, etc)?
@SIM0N-F if the original issue is fixed, please close this one and open a new one.
Hello,
Thanks for answer,
I have added this option to correct this problem:
osd_pool_default_size: 3
ceph_osd_pool_default_size: 3
I use playbook with modified site.yml and this inventory file:
[mons]
xxxx monitor_address=xxxx
xxxx monitor_address=xxxx
xxxx monitor_address=xxxx
[osds]
xxxx
xxxx
xxxx
#[agents]
#10.84.21.13
#10.84.21.14
#10.84.21.15
[mdss]
100.127.2.2
100.127.2.3
100.127.2.4
[rgws]
xxxx
[restapis]
xxxx
xxxx
xxxx
[mgrs]
xxxx
xxxx
xxxx
Now the deployment result seems successfully:
PLAY RECAP *****************************************************************************************************
100.127.2.2 : ok=382 changed=17 unreachable=0 failed=0
100.127.2.3 : ok=272 changed=15 unreachable=0 failed=0
100.127.2.4 : ok=275 changed=15 unreachable=0 failed=0
INSTALLER STATUS ***********************************************************************************************
Install Ceph Monitor : Complete (0:01:39)
Install Ceph Manager : Complete (0:01:14)
Install Ceph OSD : Complete (0:04:33)
Install Ceph MDS : Complete (0:01:18)
Install Ceph RGW : Complete (0:00:48)
But when I test ceph -s
cluster:
id: b77ed55d-906c-4234-b5c3-8fe3b5dc1bf3
health: HEALTH_WARN
1 MDSs report slow metadata IOs
1 filesystem is online with fewer MDS than max_mds
Reduced data availability: 88 pgs inactive
services:
mon: 3 daemons, quorum ceph-mutu-1,ceph-mutu-2,ceph-mutu-3
mgr: ceph-mutu-1(active), standbys: ceph-mutu-2, ceph-mutu-3
mds: cephfs-1/1/3 up {0=ceph-mutu-1=up:creating}, 2 up:standby
osd: 2 osds: 0 up, 0 in
data:
pools: 3 pools, 88 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
88 unknown
And when I check mounted disk:
/dev/sde1 on /var/lib/ceph/osd/ceph-11 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdi1 on /var/lib/ceph/osd/ceph-23 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdh1 on /var/lib/ceph/osd/ceph-20 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdf1 on /var/lib/ceph/osd/ceph-14 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdk1 on /var/lib/ceph/osd/ceph-29 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdc1 on /var/lib/ceph/osd/ceph-5 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdg1 on /var/lib/ceph/osd/ceph-17 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdj1 on /var/lib/ceph/osd/ceph-25 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
But:
# fdisk -l /dev/sdk
Attention : la prise en charge de GPT dans fdisk est récente, et par conséquent en phase expérimentale. Utilisez-la avec précaution.
Disque /dev/sdk : 1999.8 Go, 1999844147200 octets, 3905945600 secteurs
Unités = secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 512 octets
taille d'E/S (minimale / optimale) : 512 octets / 512 octets
Type d'étiquette de disque : gpt
Disk identifier: 02FF0C56-787A-451A-A307-1A3900A3D0C5
# Start End Size Type Name
1 2048 206847 100M Ceph OSD ceph data
2 206848 3905945566 1,8T inconnu ceph block
@SIM0N-F please, open a new issue for this.
ok sorry @guits I have open new issue: https://github.com/ceph/ceph-ansible/issues/3450
@SIM0N-F thanks, no worries, it's just better to not confuse people about the status of the current issue.
@SIM0N-F @rishabh-d-dave what's the status of this issue ?
@SIM0N-F @guits Since the previous issue reported here is identical to the one on new issue report, I think we can close this issue.
Yes, I close this issue.
Bug Report
What happened:
When I deploy cluster with last version of ceph-ansible (git clone this morning) the playbook dont create /etc/ceph/ceph.client.admin.keyring or any other key.
The playbook stop in create ceph mgr keyring(s) :
How to reproduce it (minimal and precise):
Just launch playbook with my group_vars/all.yml
I use roles:
Share your group_vars files, inventory
Environment:
uname -a
): 3.10.0-957.el7.x86_64ansible-playbook --version
): ansible 2.7.4git head or tag or stable branch
): git headceph -v
): ceph version 13.2.2My ./group_var/all.yml
So could you help me please. Manny thanks by advance !!