IntelSmartEdge / open-developer-experience-kits

Source code for experience kits that use Edge Software Provisioner for deployment.
Apache License 2.0
22 stars 22 forks source link

Smart Edge Open Deployment failed #5

Open 11991585 opened 2 years ago

11991585 commented 2 years ago

I am Installing the Image on the Target System, but when I connect to the Target system via SSH it shows Smart Edge Open Deployment Status: failed Can you please help me? thanks

Target System log screenshot

Target_System_log_Screenshot

This is the content displayed after ssh to the target system:

login as: smartedge-open
Pre-authentication banner message from server:
| Ubuntu 20.04 LTS
|
| Smart Edge Open Deployment Status: failed. Check logs in /opt/seo/logs. To re
> start deployment run: systemctl restart seo
|
End of banner message from server

View log according to the output prompt

$ cat /opt/seo/logs/seo_dek_dek_2021-12-15_02-53-05_single_node_network_edge.yml.log
·
·
·
TASK [baseline_ansible/kubernetes/operator/sriov_network_operator/configure : apply sriov network configuration] ***
task path: /opt/seo/roles/baseline_ansible/kubernetes/operator/sriov_network_operator/configure/tasks/main.yml:45
Wednesday 15 December 2021  04:10:32 +0000 (0:00:01.987)       1:17:23.226 ****
FAILED - RETRYING: apply sriov network configuration (50 retries left).
·
·
·
FAILED - RETRYING: apply sriov network configuration (1 retries left).
fatal: [controller]: FAILED! => {
    "attempts": 50,
    "changed": true,
    "cmd": [
        "kubectl",
        "apply",
        "-f",
        "./"
    ],
    "delta": "0:00:13.652244",
    "end": "2021-12-15 04:20:47.438243",
    "rc": 1,
    "start": "2021-12-15 04:20:33.785999"
}

STDOUT:

sriovnetwork.sriovnetwork.openshift.io/sriov-netdev-network-c0p0 unchanged
sriovnetwork.sriovnetwork.openshift.io/sriov-netdev-network-c1p0 unchanged
sriovnetwork.sriovnetwork.openshift.io/sriov-vfio-network-c0p1 unchanged
sriovnetwork.sriovnetwork.openshift.io/sriov-vfio-network-c1p1 unchanged

STDERR:

Error from server (no supported NIC is selected by the nicSelector in CR sriov-netdev-net-c0p0): error when creating "sriov-netdev-net-c0p0-sriov_network_node_policy.yml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: no supported NIC is selected by the nicSelector in CR sriov-netdev-net-c0p0
Error from server (no supported NIC is selected by the nicSelector in CR sriov-netdev-net-c1p0): error when creating "sriov-netdev-net-c1p0-sriov_network_node_policy.yml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: no supported NIC is selected by the nicSelector in CR sriov-netdev-net-c1p0
Error from server (no supported NIC is selected by the nicSelector in CR sriov-vfio-pci-net-c0p1): error when creating "sriov-vfio-pci-net-c0p1-sriov_network_node_policy.yml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: no supported NIC is selected by the nicSelector in CR sriov-vfio-pci-net-c0p1
Error from server (no supported NIC is selected by the nicSelector in CR sriov-vfio-pci-net-c1p1): error when creating "sriov-vfio-pci-net-c1p1-sriov_network_node_policy.yml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: no supported NIC is selected by the nicSelector in CR sriov-vfio-pci-net-c1p1

MSG:

non-zero return code

PLAY RECAP *********************************************************************
controller                 : ok=384  changed=210  unreachable=0    failed=1    skipped=80   rescued=0    ignored=6
node01                     : ok=156  changed=51   unreachable=0    failed=0    skipped=92   rescued=0    ignored=1

Wednesday 15 December 2021  04:20:47 +0000 (0:10:15.209)       1:27:38.436 ****
===============================================================================
kubernetes/cni ------------------------------------------------------- 1254.44s
baseline_ansible/kubernetes/operator/sriov_network_operator/install -- 1145.06s
telemetry/grafana ----------------------------------------------------- 663.51s
baseline_ansible/kubernetes/operator/sriov_network_operator/configure - 619.52s
kubernetes/controlplane ----------------------------------------------- 233.09s
infrastructure/install_dependencies ----------------------------------- 210.86s
infrastructure/docker ------------------------------------------------- 206.76s
kubernetes/cni/multus/controlplane ------------------------------------ 165.45s
baseline_ansible/infrastructure/build_nic_drivers --------------------- 127.22s
telemetry/prometheus -------------------------------------------------- 118.32s
kubernetes/nfd --------------------------------------------------------- 91.27s
telemetry/collectd/controlplane ---------------------------------------- 65.34s
kubernetes/install ----------------------------------------------------- 59.20s
telemetry/statsd-exporter ---------------------------------------------- 57.59s
kubernetes/harbor_registry/controlplane -------------------------------- 57.49s
telemetry/cadvisor ----------------------------------------------------- 42.25s
baseline_ansible/infrastructure/install_golang ------------------------- 42.21s
kubernetes/cni/calico/controlplane ------------------------------------- 21.64s
baseline_ansible/infrastructure/install_packages ----------------------- 15.17s
kubernetes/harbor_registry/node ---------------------------------------- 10.55s
kubernetes/helm --------------------------------------------------------- 8.59s
infrastructure/os_setup ------------------------------------------------- 8.14s
baseline_ansible/infrastructure/os_requirements/disable_swap ------------ 4.47s
infrastructure/firewall_open_ports -------------------------------------- 4.08s
gather_facts ------------------------------------------------------------ 3.93s
telemetry/certs --------------------------------------------------------- 3.36s
kubernetes/custom_namespace --------------------------------------------- 2.74s
baseline_ansible/infrastructure/os_requirements/dns_stub_listener ------- 2.34s
telemetry/collectd/node ------------------------------------------------- 2.18s
kubernetes/default_netpol ----------------------------------------------- 1.94s
baseline_ansible/kubernetes/operator/sriov_network_operator/prepare_node --- 1.30s
infrastructure/git_repo ------------------------------------------------- 1.17s
baseline_ansible/infrastructure/install_openssl ------------------------- 1.12s
kubernetes/customize_kubelet -------------------------------------------- 0.98s
infrastructure/conditional_reboot --------------------------------------- 0.77s
baseline_ansible/infrastructure/os_proxy -------------------------------- 0.70s
baseline_ansible/infrastructure/configure_udev -------------------------- 0.69s
shell ------------------------------------------------------------------- 0.47s
baseline_ansible/infrastructure/os_requirements/enable_ipv4_forwarding --- 0.39s
infrastructure/build_noproxy -------------------------------------------- 0.38s
fail -------------------------------------------------------------------- 0.34s
kubernetes/create_namespaces -------------------------------------------- 0.29s
infrastructure/grub ----------------------------------------------------- 0.20s
infrastructure/e810_driver_update --------------------------------------- 0.14s
include_tasks ----------------------------------------------------------- 0.13s
baseline_ansible/infrastructure/selinux --------------------------------- 0.10s
set_fact ---------------------------------------------------------------- 0.09s
debug ------------------------------------------------------------------- 0.08s
baseline_ansible/infrastructure/disable_fingerprint_authentication ------ 0.08s
baseline_ansible/infrastructure/time_setup_ntp -------------------------- 0.06s
include_vars ------------------------------------------------------------ 0.05s
infrastructure/setup_baseline_ansible ----------------------------------- 0.04s
stat -------------------------------------------------------------------- 0.02s
infrastructure/setup_offline -------------------------------------------- 0.02s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
total ---------------------------------------------------------------- 5258.34s

Check installation logs

$ sudo journalctl -xefu seo
·
·
·
Dec 15 04:20:47 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:47.555 ERROR: seo_dek single_node_network_edge.yml: failed. Please check the logs: /opt/seo/logs/seo_dek_dek_2021-12-15_02-53-05_single_node_network_edge.yml.log
Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.557 INFO: ====================
Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.557 INFO: DEPLOYMENT RECAP:
Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.557 INFO: ====================
Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.557 INFO: DEPLOYMENT COUNT: 1
Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.557 INFO: SUCCESSFUL DEPLOYMENTS: 0

Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.558 INFO: FAILED DEPLOYMENTS: 1
Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.558 INFO: DEPLOYMENT "seo_dek": FAILED
Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.558 INFO: ====================
Dec 15 04:20:48 ubuntu-a39bb0a8ad bash[793]: 2021-12-15 04:20:48.558 INFO: Deployment failed, pulling logs
Ubuntu 20.04 LTS
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[195666]: Smart Edge Open Deployment Status: in progress
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + status=1
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + set -e
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + clear_status
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + for f in "${issue_files[@]}"
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + sed '/Smart Edge Open Deployment Status/d' -i /etc/issue
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + printf '%s\n' 'Ubuntu 20.04 LTS GNU/Linux 5.4.0-91-generic x86_64 \l
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: IP Address:
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: 192.168.1.6
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: Routes:
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: default via 192.168.1.1 dev enp10s0 proto dhcp src 192.168.1.6 metric 1024
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: 192.168.1.0/24 dev enp10s0 proto kernel scope link src 192.168.1.6
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: 192.168.1.1 dev enp10s0 proto dhcp scope link src 192.168.1.6 metric 1024
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: LANs:
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: 1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: 2: enp10s0    inet 192.168.1.6/24 brd 192.168.1.255 scope global enp10s0\       valid_lft forever preferred_lft forever'
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + for f in "${issue_files[@]}"
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + sed '/Smart Edge Open Deployment Status/d' -i /etc/issue.net
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + printf '%s\n' 'Ubuntu 20.04 LTS'
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + '[' 1 -eq 0 ']'
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + set_status 'failed. Check logs in /opt/seo/logs. To restart deployment run: systemctl restart seo'
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + local 'deploy_status=failed. Check logs in /opt/seo/logs. To restart deployment run: systemctl restart seo'
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + for f in "${issue_files[@]}"
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + echo -e '\nSmart Edge Open Deployment Status: failed. Check logs in /opt/seo/logs. To restart deployment run: systemctl restart seo\n'
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + for f in "${issue_files[@]}"
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + echo -e '\nSmart Edge Open Deployment Status: failed. Check logs in /opt/seo/logs. To restart deployment run: systemctl restart seo\n'
Dec 15 04:20:50 ubuntu-a39bb0a8ad bash[739]: + exit 1
Dec 15 04:20:50 ubuntu-a39bb0a8ad systemd[1]: seo.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit seo.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Dec 15 04:20:50 ubuntu-a39bb0a8ad systemd[1]: seo.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit seo.service has entered the 'failed' state with result 'exit-code'.
inteltiger commented 2 years ago

this means SRIOV deployment is failed.

first of all, your hardware, mainboard, bios, nic must support SRIOV. you need intel certified nic e.g. XXV710, E810,

if you don't have such hardware, you should disable SRIOV feature: change sriov_network_operator_enable to false in 10-default.yml , and then run "systemctl restart seo"

https://smart-edge-open.github.io/docs/components/networking/sriov-network-operator/#limitations

useful commands:

lshw -c network -businfo lspci -nn | grep -i 'Ethernet Controller' dmesg | grep nvm dmesg | grep eth

Useful RedHat document of SRIOV:

https://docs.openshift.com/container-platform/4.9/networking/hardware_networks/configuring-sriov-device.html (current version is 4.9, you might select the latest version on the page)

https://docs.openshift.com/container-platform/4.9/networking/hardware_networks/configuring-sriov-net-attach.html

architag21 commented 2 years ago

hi @11991585 - Are you still facing issue in installing DEK or can we close this ticket ?