Closed wenyiz2021 closed 10 months ago
I tried change module_ignore_error
to false, on terminal it'll show the cmd fail, but output of the shell cmd still say failed = false
(Pdb) out = duthost.shell("sudo config reload -y", executable="/bin/bash", module_ignore_errors=False)
Friday 16 June 2023 23:48:14 +0000 (0:00:46.822) 0:09:26.849 ***********
*** RunAnsibleModuleFail: run module shell failed, Ansible Results =>
{"changed": true, "cmd": "sudo config reload -y", "delta": "0:00:00.448957", "end": "2023-06-16 23:48:15.978370", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2023-06-16 23:48:15.529413", "stderr": "", "stderr_lines": [], "stdout": "SwSS container is not ready. Retry later or use -f to avoid system checks", "stdout_lines": ["SwSS container is not ready. Retry later or use -f to avoid system checks"], "warnings": ["Consider using 'become', 'become_method', and 'become_user' rather than running sudo"]}
(Pdb) out
{'stderr_lines': [], u'cmd': u'sudo config reload -y', u'end': u'2023-06-16 23:48:04.137881', '_ansible_no_log': False, u'stdout': u'Disabling container monitoring ...\nStopping SONiC target ...\nRunning command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db\nRunning command: /usr/local/bin/db_migrator.py -o migrate\nRunning command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment\nRestarting SONiC target ...\nEnabling container monitoring ...\nReloading Monit configuration ...\nReinitializing monit daemon', u'changed': True, u'rc': 0, u'start': u'2023-06-16 23:47:28.697315', u'stderr': u'', u'delta': u'0:00:35.440566', u'invocation': {u'module_args': {u'creates': None, u'executable': u'/bin/bash', u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'sudo config reload -y', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, 'stdout_lines': [u'Disabling container monitoring ...', u'Stopping SONiC target ...', u'Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db', u'Running command: /usr/local/bin/db_migrator.py -o migrate', u'Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment', u'Restarting SONiC target ...', u'Enabling container monitoring ...', u'Reloading Monit configuration ...', u'Reinitializing monit daemon'], u'warnings': [u"Consider using 'become', 'become_method', and 'become_user' rather than running sudo"], 'failed': False}
(Pdb) out['stdout']
u'Disabling container monitoring ...\nStopping SONiC target ...\nRunning command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db\nRunning command: /usr/local/bin/db_migrator.py -o migrate\nRunning command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment\nRestarting SONiC target ...\nEnabling container monitoring ...\nReloading Monit configuration ...\nReinitializing monit daemon'
cc @arlakshm
expectation is:
I am unsure if it's Ansible issue or hardware issue, @Staphylo @kenneth-arista can you please help to confirm?
able to recognize this cmd failed to executre -> 'failed' = True
Ansible by default only considers return code for determining if the command succeeded or failed. And from the perspective of the config command, it looks like returning success is appropriate if it didn't detect that the system still booting.
I'm looking into why the config command isn't detecting that the system is still booting.
I haven't been able to reproduce this locally but my theory is that this is being caused by (1) the switch booting up faster than the test can reach this check, so the system is already running or (2) a service fails during startup, leaving the system in degraded
state, which may be overriding the started
state that config reload
is looking for.
If you encounter this issue again, it would be useful to run systemctl status
on the DUT to rule out (2).
this is fixed in https://github.com/sonic-net/sonic-mgmt/pull/7953
hi @patrickmacarthur @Staphylo @kenneth-arista
it failed to issue 'Retry later' message immediately after a reboot happened on the dut. failure point: https://github.com/sonic-net/sonic-mgmt/blob/0d6fedb76aa3b95ce5b6cbd44c528b0dd7ffbfcd/tests/platform_tests/test_reload_config.py#L102
output of shell cmd on Arista card:
expected:
this is the case happened on all linecards -- CL2 and wolverine