aristanetworks / sonic

Open source drivers and initialization library for Arista platforms running SONiC
GNU General Public License v2.0
22 stars 30 forks source link

[chassis] phy-credo errors seen on linecard on syncd shutdown #40

Closed arlakshm closed 1 year ago

arlakshm commented 2 years ago

The following errors are seen on 100G linecard when syncd is shutdown

May  6 20:55:58.273862 str2-7804-lc7-1 INFO systemd[1]: syncd.service: Succeeded.
May  6 20:55:58.274245 str2-7804-lc7-1 INFO systemd[1]: Stopped syncd service.
May  6 20:55:58.277717 str2-7804-lc7-1 INFO systemd[1]: swss.service: Succeeded.
May  6 20:55:58.278080 str2-7804-lc7-1 INFO systemd[1]: Stopped switch state service.
May  6 20:55:58.278345 str2-7804-lc7-1 INFO systemd[1]: interfaces-config.service: Succeeded.
May  6 20:55:58.278839 str2-7804-lc7-1 INFO systemd[1]: Stopped Update interfaces configuration.
May  6 20:55:58.278921 str2-7804-lc7-1 INFO systemd[1]: Stopping Update interfaces configuration...
May  6 20:56:28.406783 str2-7804-lc7-1 INFO phy-credo.py[2658]: Traceback (most recent call last):
May  6 20:56:28.406946 str2-7804-lc7-1 INFO phy-credo.py[2658]:   File "/usr/bin/phy-credo.py", line 261, in <module>
May  6 20:56:28.407042 str2-7804-lc7-1 INFO phy-credo.py[2658]:     sys.exit(main())
May  6 20:56:28.407096 str2-7804-lc7-1 INFO phy-credo.py[2658]:   File "/usr/bin/phy-credo.py", line 254, in main
May  6 20:56:28.407142 str2-7804-lc7-1 INFO phy-credo.py[2658]:     while phyd.run():
May  6 20:56:28.407198 str2-7804-lc7-1 INFO phy-credo.py[2658]:   File "/usr/bin/phy-credo.py", line 225, in run
May  6 20:56:28.407248 str2-7804-lc7-1 INFO phy-credo.py[2658]:     intf2medium = self.get_xcvr_medium_map()
May  6 20:56:28.407295 str2-7804-lc7-1 INFO phy-credo.py[2658]:   File "/usr/bin/phy-credo.py", line 194, in get_xcvr_medium_map
May  6 20:56:28.407340 str2-7804-lc7-1 INFO phy-credo.py[2658]:     for key in self.db.keys(self.db.STATE_DB, 'TRANSCEIVER_INFO|*'):
May  6 20:56:28.407388 str2-7804-lc7-1 INFO phy-credo.py[2658]: TypeError: 'NoneType' object is not iterable
May  6 20:56:28.441247 str2-7804-lc7-1 NOTICE systemd[1]: phy-credo-daemon.service: Main process exited, code=exited, status=1/FAILURE
May  6 20:56:28.441386 str2-7804-lc7-1 WARNING systemd[1]: phy-credo-daemon.service: Failed with result 'exit-code'.
May  6 20:57:05.094161 str2-7804-lc7-1 WARNING systemd[1]: hostcfgd.service: State 'stop-sigterm' timed out. Killing.
May  6 20:57:05.094333 str2-7804-lc7-1 NOTICE systemd[1]: hostcfgd.service: Killing process 3984 (hostcfgd) with signal SIGKILL.
May  6 20:57:05.096526 str2-7804-lc7-1 WARNING systemd[1]: hostcfgd.service: Main process exited, code=killed, status=9/KILL
May  6 20:57:05.096626 str2-7804-lc7-1 WARNING systemd[1]: hostcfgd.service: Failed with result 'timeout'.
May  6 20:57:05.097620 str2-7804-lc7-1 INFO systemd[1]: Stopped Host config enforcer daemon.
May  6 20:57:05.099072 str2-7804-lc7-1 INFO systemd[1]: hostcfgd.timer: Succeeded.
May  6 20:57:05.099278 str2-7804-lc7-1 INFO systemd[1]: Stopped Delays hostcfgd daemon until SONiC has started.
May  6 20:57:05.099365 str2-7804-lc7-1 INFO systemd[1]: Stopping Delays hostcfgd daemon until SONiC has started.
May  6 20:57:05.099516 str2-7804-lc7-1 INFO systemd[1]: Started Delays hostcfgd daemon until SONiC has started.
May  6 20:57:05.099936 str2-7804-lc7-1 INFO systemd[1]: updategraph.service: Succeeded.
May  6 20:57:05.101063 str2-7804-lc7-1 INFO systemd[1]: Stopped Update minigraph and set configuration based on minigraph.
May  6 20:57:05.101182 str2-7804-lc7-1 INFO systemd[1]: Stopping Update minigraph and set configuration based on minigraph...
May  6 20:57:05.101420 str2-7804-lc7-1 INFO systemd[1]: config-setup.service: Succeeded.
May  6 20:57:05.102573 str2-7804-lc7-1 INFO systemd[1]: Stopped Config initialization and migration service.
May  6 20:57:05.102698 str2-7804-lc7-1 INFO systemd[1]: Stopping Config initialization and migration service...
Staphylo commented 2 years ago

We will look into this. Likely a case where phy-credo does not gracefully handle the database going down.

byu343 commented 2 years ago

Hi @arlakshm, Do you know which command trigged the shutdown, 'reboot', 'config reload' or any other one? Thanks.

arlakshm commented 2 years ago

I saw this on reboot.