ProgrammeVitam / vitam

Digital Archives Management System developped by French government/Programme interministériel archives numériques ; core system.
CeCILL Free Software License Agreement v2.1
123 stars 41 forks source link

Déploiement de vitam #70

Closed lahatch closed 2 years ago

lahatch commented 3 years ago

Bonjour, j'essaye de déployer vitam sur ma plateforme de dev, composée de 3 serveurs. Je bloque à chaque fois sur la même action TASK [init_es_cluster_index_template : Wait for Elasticsearch cluster elasticsearch-log to be resolved by Consul] il semble que le nom d'hôte elasticsearch-log.service.consul ne puisse être résolu

Je ne vois pas vraiment ce que je dois faire pour que ça passe...

Voici une trace plus complète : TASK [init_es_cluster_index_template : Wait for Elasticsearch cluster elasticsearch-log to be resolved by Consul] ****************************************************************************************** task path: /data/wksp/vitam/deployment/ansible-vitam/roles/init_es_cluster_index_template/tasks/main.yml:19 Monday 11 January 2021 10:39:53 +0100 (0:00:00.104) 0:07:12.905 ******** <10.89.14.178> ESTABLISH SSH CONNECTION FOR USER: A702394 <10.89.14.178> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="A702394"' -o ConnectTimeout=10 -o ControlPath=/home/a702394/.ansible/cp/5a1ab3c073 10.89.14.178 '/bin/sh -c '"'"'echo ~A702394 && sleep 0'"'"'' <10.89.14.178> (0, '/home/a702394\n', '') <10.89.14.178> ESTABLISH SSH CONNECTION FOR USER: A702394 <10.89.14.178> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="A702394"' -o ConnectTimeout=10 -o ControlPath=/home/a702394/.ansible/cp/5a1ab3c073 10.89.14.178 '/bin/sh -c '"'"'( umask 77 && mkdir -p " echo /home/a702394/.ansible/tmp "&& mkdir " echo /home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554 " && echo ansible-tmp-1610357993.62-99820-49311844653554=" echo /home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554 " ) && sleep 0'"'"'' <10.89.14.178> (0, 'ansible-tmp-1610357993.62-99820-49311844653554=/home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554\n', '') Using module file /usr/lib/python2.7/site-packages/ansible/modules/utilities/logic/wait_for.py <10.89.14.178> PUT /home/a702394/.ansible/tmp/ansible-local-95463SmU6Ll/tmpUARLtg TO /home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554/AnsiballZ_wait_for.py <10.89.14.178> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="A702394"' -o ConnectTimeout=10 -o ControlPath=/home/a702394/.ansible/cp/5a1ab3c073 '[10.89.14.178]' <10.89.14.178> (0, 'sftp> put /home/a702394/.ansible/tmp/ansible-local-95463SmU6Ll/tmpUARLtg /home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554/AnsiballZ_wait_for.py\n', '') <10.89.14.178> ESTABLISH SSH CONNECTION FOR USER: A702394 <10.89.14.178> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="A702394"' -o ConnectTimeout=10 -o ControlPath=/home/a702394/.ansible/cp/5a1ab3c073 10.89.14.178 '/bin/sh -c '"'"'chmod u+x /home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554/ /home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554/AnsiballZ_wait_for.py && sleep 0'"'"'' <10.89.14.178> (0, '', '') <10.89.14.178> ESTABLISH SSH CONNECTION FOR USER: A702394 <10.89.14.178> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="A702394"' -o ConnectTimeout=10 -o ControlPath=/home/a702394/.ansible/cp/5a1ab3c073 -tt 10.89.14.178 '/bin/sh -c '"'"'sudo -H -S -p "[sudo via ansible, key=jmbpwpnsuzarlgjidkhzbooyvltwmqil] password:" -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-jmbpwpnsuzarlgjidkhzbooyvltwmqil ; /usr/bin/python /home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554/AnsiballZ_wait_for.py'"'"'"'"'"'"'"'"' && sleep 0'"'"'' Escalation succeeded <10.89.14.178> (1, '\r\n\r\n{"msg": "Timeout when waiting for elasticsearch-log.service.consul:9201", "failed": true, "exception": "WARNING: The below traceback may *not* be related to the actual failure.\\n File \\"/tmp/ansible_wait_for_payload_PMMZIV/ansible_wait_for_payload.zip/ansible/modules/utilities/logic/wait_for.py\\", line 599, in main\\n File \\"/tmp/ansible_wait_for_payload_PMMZIV/ansible_wait_for_payload.zip/ansible/modules/utilities/logic/wait_for.py\\", line 456, in _create_connection\\n File \\"/usr/lib64/python2.7/socket.py\\", line 553, in create_connection\\n for res in getaddrinfo(host, port, 0, SOCK_STREAM):\\n", "elapsed": 303, "invocation": {"module_args": {"active_connection_states": ["ESTABLISHED", "FIN_WAIT1", "FIN_WAIT2", "SYN_RECV", "SYN_SENT", "TIME_WAIT"], "host": "elasticsearch-log.service.consul", "port": 9201, "delay": 0, "msg": null, "state": "started", "sleep": 1, "timeout": 300, "exclude_hosts": null, "search_regex": null, "path": null, "connect_timeout": 5}}}\r\n', 'Shared connection to 10.89.14.178 closed.\r\n') <10.89.14.178> Failed to connect to the host via ssh: Shared connection to 10.89.14.178 closed. <10.89.14.178> ESTABLISH SSH CONNECTION FOR USER: A702394 <10.89.14.178> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="A702394"' -o ConnectTimeout=10 -o ControlPath=/home/a702394/.ansible/cp/5a1ab3c073 10.89.14.178 '/bin/sh -c '"'"'rm -f -r /home/a702394/.ansible/tmp/ansible-tmp-1610357993.62-99820-49311844653554/ > /dev/null 2>&1 && sleep 0'"'"'' <10.89.14.178> (0, '', '') The full traceback is: WARNING: The below traceback may *not* be related to the actual failure. File "/tmp/ansible_wait_for_payload_PMMZIV/ansible_wait_for_payload.zip/ansible/modules/utilities/logic/wait_for.py", line 599, in main File "/tmp/ansible_wait_for_payload_PMMZIV/ansible_wait_for_payload.zip/ansible/modules/utilities/logic/wait_for.py", line 456, in _create_connection File "/usr/lib64/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): fatal: [10.89.14.178]: FAILED! => { "changed": false, "elapsed": 303, "invocation": { "module_args": { "active_connection_states": [ "ESTABLISHED", "FIN_WAIT1", "FIN_WAIT2", "SYN_RECV", "SYN_SENT", "TIME_WAIT" ], "connect_timeout": 5, "delay": 0, "exclude_hosts": null, "host": "elasticsearch-log.service.consul", "msg": null, "path": null, "port": 9201, "search_regex": null, "sleep": 1, "state": "started", "timeout": 300 } }, "msg": "Timeout when waiting for elasticsearch-log.service.consul:9201" }

lahatch commented 3 years ago

De plus, dans le fichier /var/log/messages de la VM sur laquelle la tâche échoue, j'ai les messages suivants

Jan 11 11:24:43 vmvitam03 consul[130049]: agent: Check "Check Consul DNS resolution for node": Timed out (5s) running check Jan 11 11:24:43 vmvitam03 consul: 2021/01/11 11:24:43 [WARN] agent: Check "Check Consul DNS resolution for node": Timed out (5s) running check Jan 11 11:24:46 vmvitam03 consul[130049]: dns: rpc error: No cluster leader Jan 11 11:24:46 vmvitam03 consul: 2021/01/11 11:24:46 [ERR] dns: rpc error: No cluster leader Jan 11 11:24:48 vmvitam03 consul: 2021/01/11 11:24:48 [ERR] dns: rpc error: No cluster leader Jan 11 11:24:48 vmvitam03 consul[130049]: dns: rpc error: No cluster leader Jan 11 11:24:48 vmvitam03 consul: 2021/01/11 11:24:48 [ERR] dns: rpc error: No cluster leader Jan 11 11:24:48 vmvitam03 consul[130049]: dns: rpc error: No cluster leader Jan 11 11:24:54 vmvitam03 consul[130049]: dns: rpc error: No cluster leader Jan 11 11:24:54 vmvitam03 consul: 2021/01/11 11:24:54 [ERR] dns: rpc error: No cluster leader Jan 11 11:24:54 vmvitam03 consul[130049]: dns: rpc error: No cluster leader Jan 11 11:24:54 vmvitam03 consul: 2021/01/11 11:24:54 [ERR] dns: rpc error: No cluster leader

TDevillechabrolle commented 3 years ago

Bonjour,

Désolé pour cette réponse tardive

qu'avez-vous dans les rubriques [hosts_consul_server] et [hosts_elasticsearch_log] de votre fichier hosts ?

Pour un déploiement à 3 serveurs, tout du moins pour commencer, je vous recommanderais de ne pas déployer le cluster Elasticlog, pour faciliter la colocalisation des services. Si il vous est indispensable, il ne faut pas placer Elastic-log sur la même vm que qu'Elastic-data

Thierry