ceph / ceph-ansible

Ansible playbooks to deploy Ceph, the distributed filesystem.
Apache License 2.0
1.68k stars 1.01k forks source link

ceph-mon : waiting for the monitor fails for jetson nano cluster (arm) #5129

Closed nylocx closed 4 years ago

nylocx commented 4 years ago

I'm currently trying to setup a very basic Ceph cluster on 5 Jetson Nano Boards with one MON and 4 OSD nodes. The vars I have set are:

devices:
  - /dev/sda

ceph_origin: repository
ceph_repository: community
ceph_stable_release: nautilus
monitor_interface: eth0
public_network: 172.16.0.0/22
dashboard_admin_password: secret
grafana_admin_password: secret

First I did not add a grafana-server group in my inventory which lead to some checks failing so I added the same node from my MON group to the grafana-server group. (Which gives me a ansible warning because of the dash in the groupname)

The error message I get with -vvvv option is:

TASK [ceph-mon : waiting for the monitor(s) to form the quorum...] **************************************************************************************************************************
task path: /home/agoertz/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:2
Friday 06 March 2020  14:41:55 +0000 (0:00:00.777)       0:08:08.394 ********** 
Using module file /usr/lib/python3.8/site-packages/ansible/modules/commands/command.py
Pipelining is enabled.
<cluster-master00> ESTABLISH SSH CONNECTION FOR USER: None
<cluster-master00> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=60 -o ControlPath=/home/agoertz/.ansible/cp/%h-%r-%p cluster-master00 '/bin/sh -c '"'"'sudo -H -S  -p "[sudo via ansible, key=jrfpegxxmkfaivcvfvknzycfpqdexrid] password:" -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-jrfpegxxmkfaivcvfvknzycfpqdexrid ; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''                                                                                                                                                                               
Escalation succeeded
<cluster-master00> (1, b'\n{"changed": true, "end": "2020-03-06 15:41:56.108613", "stdout": "server name not found: [v2:172.16.0.30:3300 (Name or service not known)", "cmd": ["ceph", "--cluster", "ceph", "-n", "mon.", "-k", "/var/lib/ceph/mon/ceph-cluster-master00/keyring", "mon_status", "--format", "json"], "failed": true, "delta": "0:00:00.257905", "stderr": "unable to parse addrs in \'[v2:172.16.0.30:3300,v1:172.16.0.30:6789]\'\\n[errno 22] error connecting to the cluster", "rc": 1, "invocation": {"module_args": {"creates": null, "executable": null, "_uses_shell": false, "strip_empty_ends": true, "_raw_params": " ceph --cluster ceph -n mon. -k /var/lib/ceph/mon/ceph-cluster-master00/keyring mon_status --format json\\n", "removes": null, "argv": null, "warn": true, "chdir": null, "stdin_add_newline": true, "stdin": null}}, "start": "2020-03-06 15:41:55.850708", "msg": "non-zero return code"}\n', b'OpenSSH_8.2p1, OpenSSL 1.1.1d  10 Sep 2019\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 10194\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 1\r\n')                                                                                                                                                            
<cluster-master00> Failed to connect to the host via ssh: OpenSSH_8.2p1, OpenSSL 1.1.1d  10 Sep 2019
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 10194
debug3: mux_client_request_session: session request sent
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 1
fatal: [cluster-master00]: FAILED! => 
  msg: |-
    The conditional check '(ceph_health_raw.stdout | length > 0) and (ceph_health_raw.stdout | default('{}') | from_json)['state'] in ['leader', 'peon']
    ' failed. The error was: Expecting value: line 1 column 1 (char 0)

For reference I also posted this as a comment in: https://github.com/ceph/ceph-ansible/issues/5090

Environment:

I only have access to the cluster during the week, so if you need some more information that requires running commands on the cluster I will provide it on Monday.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.