ShubhamTatvamasi / magma-galaxy

https://galaxy.ansible.com/shubhamtatvamasi/magma
BSD 3-Clause "New" or "Revised" License
13 stars 20 forks source link

Metalib - Create IP address pool #15

Closed random1337-infosec closed 11 months ago

random1337-infosec commented 11 months ago

Hi.

I'm having a problem during the installation (when running ansible-playbook deploy-orc8r.yml)- it says TASK [metallb : Create IP Address Pool] ***** Saturday 23 September 2023 20:05:52 +0000 (0:00:00.033) 0:01:25.879 **** FAILED - RETRYING: [10.83.46.192]: Create IP Address Pool (100 retries left).

Here are more debugging logs.

<10.83.46.192> ESTABLISH SSH CONNECTION FOR USER: magma <10.83.46.192> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="magma"' -o ConnectTimeout=10 -o 'ControlPath="/root/.ansible/cp/d4f02d6718"' 10.83.46.192 '/bin/sh -c '"'"'echo ~magma && sleep 0'"'"'' <10.83.46.192> (0, b'/home/magma\n', b'') <10.83.46.192> ESTABLISH SSH CONNECTION FOR USER: magma <10.83.46.192> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="magma"' -o ConnectTimeout=10 -o 'ControlPath="/root/.ansible/cp/d4f02d6718"' 10.83.46.192 '/bin/sh -c '"'"'( umask 77 && mkdir -p " echo /home/magma/.ansible/tmp "&& mkdir " echo /home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473 " && echo ansible-tmp-1695498515.764573-2335007-509088144473=" echo /home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473 " ) && sleep 0'"'"'' <10.83.46.192> (0, b'ansible-tmp-1695498515.764573-2335007-509088144473=/home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473\n', b'') Using module file /usr/lib/python3/dist-packages/ansible_collections/kubernetes/core/plugins/modules/k8s.py <10.83.46.192> PUT /root/.ansible/tmp/ansible-local-2290544dmiexlbt/tmpbtxym44d TO /home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473/AnsiballZ_k8s.py <10.83.46.192> SSH: EXEC sshpass -d12 sftp -o BatchMode=no -b - -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="magma"' -o ConnectTimeout=10 -o 'ControlPath="/root/.ansible/cp/d4f02d6718"' '[10.83.46.192]' <10.83.46.192> (0, b'sftp> put /root/.ansible/tmp/ansible-local-2290544dmiexlbt/tmpbtxym44d /home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473/AnsiballZ_k8s.py\n', b'') <10.83.46.192> ESTABLISH SSH CONNECTION FOR USER: magma <10.83.46.192> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="magma"' -o ConnectTimeout=10 -o 'ControlPath="/root/.ansible/cp/d4f02d6718"' 10.83.46.192 '/bin/sh -c '"'"'chmod u+x /home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473/ /home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473/AnsiballZ_k8s.py && sleep 0'"'"'' <10.83.46.192> (0, b'', b'') <10.83.46.192> ESTABLISH SSH CONNECTION FOR USER: magma <10.83.46.192> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="magma"' -o ConnectTimeout=10 -o 'ControlPath="/root/.ansible/cp/d4f02d6718"' -tt 10.83.46.192 '/bin/sh -c '"'"'/usr/bin/python3 /home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473/AnsiballZ_k8s.py && sleep 0'"'"'' <10.83.46.192> (1, b'\r\n{"reason": "Internal Server Error", "failed": true, "msg": "Failed to create object: b\'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Internal error occurred: failed calling webhook \\\\\"ipaddresspoolvalidationwebhook.metallb.io\\\\\": failed to call webhook: Post \\\\\"https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\\\\\": dial tcp 10.43.63.10:443: connect: connection refused\",\"reason\":\"InternalError\",\"details\":{\"causes\":[{\"message\":\"failed calling webhook \\\\\"ipaddresspoolvalidationwebhook.metallb.io\\\\\": failed to call webhook: Post \\\\\"https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\\\\\": dial tcp 10.43.63.10:443: connect: connection refused\"}]},\"code\":500}\\n\'", "exception": " File \"/tmp/ansible_kubernetes.core.k8s_payload_8sdera77/ansible_kubernetes.core.k8s_payload.zip/ansible_collections/kubernetes/core/plugins/module_utils/k8s/runner.py\", line 68, in run_module\n result = perform_action(svc, definition, module.params)\n File \"/tmp/ansible_kubernetes.core.k8s_payload_8sdera77/ansible_kubernetes.core.k8s_payload.zip/ansible_collections/kubernetes/core/plugins/module_utils/k8s/runner.py\", line 152, in perform_action\n instance = svc.create(resource, definition)\n File \"/tmp/ansible_kubernetes.core.k8s_payload_8sdera77/ansible_kubernetes.core.k8s_payload.zip/ansible_collections/kubernetes/core/plugins/module_utils/k8s/service.py\", line 336, in create\n raise CoreException(msg) from e\n", "invocation": {"module_args": {"namespace": "metallb-system", "definition": {"apiVersion": "metallb.io/v1beta1", "kind": "IPAddressPool", "metadata": {"name": "ip-pool", "namespace": "metallb-system"}, "spec": {"addresses": ["10.83.46.192/32"]}}, "resource_definition": {"apiVersion": "metallb.io/v1beta1", "kind": "IPAddressPool", "metadata": {"name": "ip-pool", "namespace": "metallb-system"}, "spec": {"addresses": ["10.83.46.192/32"]}}, "api_version": "v1", "wait": false, "wait_sleep": 5, "wait_timeout": 120, "append_hash": false, "apply": false, "continue_on_error": false, "state": "present", "force": false, "kind": null, "name": null, "src": null, "kubeconfig": null, "context": null, "host": null, "api_key": null, "username": null, "password": null, "validate_certs": null, "ca_cert": null, "client_cert": null, "client_key": null, "proxy": null, "no_proxy": null, "proxy_headers": null, "persist_config": null, "impersonate_user": null, "impersonate_groups": null, "wait_condition": null, "merge_type": null, "validate": null, "template": null, "delete_options": null, "label_selectors": null, "generate_name": null, "server_side_apply": null}}}\r\n', b'Shared connection to 10.83.46.192 closed.\r\n') <10.83.46.192> Failed to connect to the host via ssh: Shared connection to 10.83.46.192 closed. <10.83.46.192> ESTABLISH SSH CONNECTION FOR USER: magma <10.83.46.192> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="magma"' -o ConnectTimeout=10 -o 'ControlPath="/root/.ansible/cp/d4f02d6718"' 10.83.46.192 '/bin/sh -c '"'"'rm -f -r /home/magma/.ansible/tmp/ansible-tmp-1695498515.764573-2335007-509088144473/ > /dev/null 2>&1 && sleep 0'"'"'' <10.83.46.192> (0, b'', b'') FAILED - RETRYING: [10.83.46.192]: Create IP Address Pool (98 retries left).Result was: { "attempts": 3, "changed": false, "invocation": { "module_args": { "api_key": null, "api_version": "v1", "append_hash": false, "apply": false, "ca_cert": null, "client_cert": null, "client_key": null, "context": null, "continue_on_error": false, "definition": { "apiVersion": "metallb.io/v1beta1", "kind": "IPAddressPool", "metadata": { "name": "ip-pool", "namespace": "metallb-system" }, "spec": { "addresses": [ "10.83.46.192/32" ] } }, "delete_options": null, "force": false, "generate_name": null, "host": null, "impersonate_groups": null, "impersonate_user": null, "kind": null, "kubeconfig": null, "label_selectors": null, "merge_type": null, "name": null, "namespace": "metallb-system", "no_proxy": null, "password": null, "persist_config": null, "proxy": null, "proxy_headers": null, "resource_definition": { "apiVersion": "metallb.io/v1beta1", "kind": "IPAddressPool", "metadata": { "name": "ip-pool", "namespace": "metallb-system" }, "spec": { "addresses": [ "10.83.46.192/32" ] } }, "server_side_apply": null, "src": null, "state": "present", "template": null, "username": null, "validate": null, "validate_certs": null, "wait": false, "wait_condition": null, "wait_sleep": 5, "wait_timeout": 120 } }, "msg": "Failed to create object: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Internal error occurred: failed calling webhook \\\"ipaddresspoolvalidationwebhook.metallb.io\\\": failed to call webhook: Post \\\"https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\\\": dial tcp 10.43.63.10:443: connect: connection refused\",\"reason\":\"InternalError\",\"details\":{\"causes\":[{\"message\":\"failed calling webhook \\\"ipaddresspoolvalidationwebhook.metallb.io\\\": failed to call webhook: Post \\\"https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\\\": dial tcp 10.43.63.10:443: connect: connection refused\"}]},\"code\":500}\n'", "reason": "Internal Server Error", "retries": 101

random1337-infosec commented 11 months ago

It seems to be a known Metallb problem with newer versions, which has been reported here: https://github.com/metallb/metallb/issues/1339

jblakley commented 11 months ago

I ran into this issue awhile ago. It hasn't been an issue with recent deployments using magma-galaxy. I don't know what was specifically fixed. I did try downgrading metallb to 0.11.0 when I first ran into it. But I'm current (successfully) running one Orc8r on metallb 0.12.1 and two on 0.13.10.

ShubhamTatvamasi commented 11 months ago

Hi @random1337-infosec, please use this code for your deployment. https://github.com/magma/magma-deployer

random1337-infosec commented 11 months ago

Hi @random1337-infosec, please use this code for your deployment. https://github.com/magma/magma-deployer

Hi @ShubhamTatvamasi. I have used this deployment process (on a multipass instance), and during the installation, I encountered an issue with the RKE binary. I'd like to note that SSH functionality is working correctly. You can view a screenshot of the problem at this link: [screenshot URL]. How should I proceed? It appears there might be an issue related to TLS certificate.

INFO[0014] Waiting for [etcd-fix-perm] container to exit on host [10.83.46.39]
INFO[0014] Removing container [etcd-fix-perm] on host [10.83.46.39], try #1
INFO[0014] [remove/etcd-fix-perm] Successfully removed container on host [10.83.46.39]
INFO[0014] [etcd] Running rolling snapshot container [etcd-rolling-snapshots] on host [10.83.46.39]
INFO[0014] Removing container [etcd-rolling-snapshots] on host [10.83.46.39], try #1
INFO[0014] [remove/etcd-rolling-snapshots] Successfully removed container on host [10.83.46.39]
INFO[0014] Image [rancher/rke-tools:v0.1.90] exists on host [10.83.46.39]
INFO[0015] Starting container [etcd-rolling-snapshots] on host [10.83.46.39], try #1
INFO[0015] [etcd] Successfully started [etcd-rolling-snapshots] container on host [10.83.46.39]
INFO[0020] Image [rancher/rke-tools:v0.1.90] exists on host [10.83.46.39]
INFO[0021] Starting container [rke-bundle-cert] on host [10.83.46.39], try #1
INFO[0021] [certificates] Successfully started [rke-bundle-cert] container on host [10.83.46.39]
INFO[0021] Waiting for [rke-bundle-cert] container to exit on host [10.83.46.39]
INFO[0021] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [10.83.46.39]
INFO[0021] Removing container [rke-bundle-cert] on host [10.83.46.39], try #1
INFO[0021] Image [rancher/rke-tools:v0.1.88] exists on host [10.83.46.39]
INFO[0022] Starting container [rke-log-linker] on host [10.83.46.39], try #1
INFO[0022] [etcd] Successfully started [rke-log-linker] container on host [10.83.46.39]
INFO[0022] Removing container [rke-log-linker] on host [10.83.46.39], try #1
INFO[0023] [remove/rke-log-linker] Successfully removed container on host [10.83.46.39]
INFO[0023] Image [rancher/rke-tools:v0.1.88] exists on host [10.83.46.39]
INFO[0023] Starting container [rke-log-linker] on host [10.83.46.39], try #1
INFO[0024] [etcd] Successfully started [rke-log-linker] container on host [10.83.46.39]
INFO[0024] Removing container [rke-log-linker] on host [10.83.46.39], try #1
INFO[0024] [remove/rke-log-linker] Successfully removed container on host [10.83.46.39]
INFO[0024] [etcd] Successfully started etcd plane.. Checking etcd cluster health
WARN[0118] [etcd] host [10.83.46.39] failed to check etcd health: failed to get /health for host [10.83.46.39]: Get "https://10.83.46.39:2379/health": remote error: tls: bad certificate
FATA[0118] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [10.83.46.39] failed to report healthy. Check etcd container logs on each host for more information
magma@orc8r:~/magma-deployer/rke$