aristanetworks / sonic

Open source drivers and initialization library for Arista platforms running SONiC
GNU General Public License v2.0
25 stars 30 forks source link

[chassis] Image upgrade on the 7808 chassis wipes out the configuration on the Linecards #105

Open arlakshm opened 5 hours ago

arlakshm commented 5 hours ago

The 7808 chassis in the lab is being upgrading using the command

./testbed-cli.sh install-image vms26-t2-7800-1 str2 http://10.201.148.43/pipelines/Networking-acs-buildimage-Official/broadcom/internal-202405/sonic-aboot-broadcom-dnx.swi

vms26-t2-7800-1 is the T2 testbed with one supervisor and 3 Linecards (2 Clearwater2 cards and one wolverine card)

str2 is the inventory file name

This command upgrades the supervisor and all the linecards and waits for Linecards and supervisor to be up.

This commands errors out because the chassis is not accessible after 15 mins

On logging in from the serial port, I see these errors on the screen for long time

[  930.802723] pcieport 0000:06:05.0: DPC: unmasked uncorrectable error detected
[  930.802798] pcieport 0000:06:05.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  930.802800] pcieport 0000:06:05.0:   device [11f8:8533] error status/mask=00000020/04400000
[  930.802802] pcieport 0000:06:05.0:    [ 5] SDES
[  930.802951] pcieport 0000:06:05.0: AER: device recovery successful
[  930.853928] pcieport 0000:06:03.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  930.953965] pcieport 0000:06:03.0:    [ 5] SDES
[  930.954113] pcieport 0000:06:03.0: AER: device recovery successful
[  931.026989] pcieport 0000:06:03.0:   device [11f8:8533] error status/mask=00000001/0000e000
[  931.432374] pcieport 0000:06:03.0: pciehp: Slot(4): No link
[  931.456587] pcieport 0000:06:03.0:    [ 0] RxErr                  (First)
[  931.556821] pcieport 0000:06:03.0: DPC: containment event, status:0x0001 source:0x0000
[  931.629785] pcieport 0000:00:03.0: AER: Corrected error message received from 0000:06:05.0
[  931.757968] pcieport 0000:06:03.0: DPC: unmasked uncorrectable error detected
[  931.758044] pcieport 0000:06:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  931.892045] pcieport 0000:06:05.0: DPC: containment event, status:0x0001 source:0x0000
[  931.931107] pcieport 0000:06:03.0:   device [11f8:8533] error status/mask=00000020/04400000
[  931.931109] pcieport 0000:06:03.0:    [ 5] SDES
[  931.931303] pcieport 0000:06:03.0: AER: device recovery successful
[  932.014670] pcieport 0000:06:05.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  932.014673] pcieport 0000:06:05.0:   device [11f8:8533] error status/mask=00000001/0000e000
[  932.014676] pcieport 0000:06:05.0:    [ 0] RxErr                  (First)
[  932.014740] pcieport 0000:00:03.0: AER: Corrected error message received from 0000:06:03.0
[  932.014791] pcieport 0000:00:03.0: AER: Multiple Corrected error message received from 0000:06:05.0
[  932.015414] pcieport 0000:00:03.0: AER: Corrected error message received from 0000:06:03.0
[  932.015465] pcieport 0000:00:03.0: AER: Corrected error message received from 0000:06:05.0
[  932.015516] pcieport 0000:00:03.0: AER: Corrected error message received from 0000:06:03.0
[  932.015567] pcieport 0000:00:03.0: AER: Corrected error message received from 0000:06:05.0
[  932.059440] pcieport 0000:06:05.0: DPC: unmasked uncorrectable error detected
[  932.584515] pcieport 0000:00:03.0: AER: Corrected error message received from 0000:06:03.0
[  932.584758] pcieport 0000:06:03.0: DPC: containment event, status:0x0001 source:0x0000
[  932.584760] pcieport 0000:06:03.0: DPC: unmasked uncorrectable error detected
[  932.584836] pcieport 0000:06:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  932.584838] pcieport 0000:06:03.0:   device [11f8:8533] error status/mask=00000020/04400000
[  932.584841] pcieport 0000:06:03.0:    [ 5] SDES
[  932.584990] pcieport 0000:06:03.0: AER: device recovery successful
[  932.621666] pcieport 0000:06:05.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  932.749805] pcieport 0000:06:03.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  932.849887] pcieport 0000:06:05.0:   device [11f8:8533] error status/mask=00040020/04400000
[  932.849891] pcieport 0000:06:05.0:    [ 5] SDES
[  932.849893] pcieport 0000:06:05.0:    [18] MalfTLP                (First)
[  932.922960] pcieport 0000:06:03.0:   device [11f8:8533] error status/mask=00000001/0000e000

Eventually these backtraces are seen supervisor bootups

[  939.096369] INFO: task systemd:1 blocked for more than 311 seconds.
[  939.096373]       Tainted: G           OE      6.1.0-22-2-amd64 #1 Debian 6.1.94-1
[  939.096375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  939.096376] task:systemd         state:D stack:0     pid:1     ppid:0      flags:0x00000002
[  939.096380] Call Trace:
[  939.096382]  <TASK>
[  939.096385]  __schedule+0x34d/0x9e0
[  939.096392]  schedule+0x5a/0xd0
[  939.096394]  schedule_timeout+0x118/0x150
[  939.096398]  __down_common+0x11a/0x230
[  939.096402]  down+0x43/0x60
[  939.096404]  console_lock+0x21/0x80
[  939.096409]  show_cons_active+0x3f/0x190
[  939.096415]  dev_attr_show+0x18/0x40
[  939.096419]  sysfs_kf_seq_show+0xa3/0xe0
[  939.096424]  seq_read_iter+0x122/0x450
[  939.096430]  vfs_read+0x23c/0x310
[  939.096434]  ksys_read+0x6b/0xf0
[  939.096437]  do_syscall_64+0x55/0xb0
[  939.096443]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096446]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096449]  ? do_syscall_64+0x61/0xb0
[  939.096452]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096454]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096457]  ? do_syscall_64+0x61/0xb0
[  939.096460]  ? do_syscall_64+0x61/0xb0
[  939.096463]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096465]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096468]  ? do_syscall_64+0x61/0xb0
[  939.096471]  ? do_syscall_64+0x61/0xb0
[  939.096474]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096476]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096479]  ? do_syscall_64+0x61/0xb0
[  939.096482]  ? do_syscall_64+0x61/0xb0
[  939.096485]  ? do_syscall_64+0x61/0xb0
[  939.096488]  ? do_syscall_64+0x61/0xb0
[  939.096492]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  939.096497] RIP: 0033:0x7f14f656f1dc
[  939.096500] RSP: 002b:00007ffc865e21b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  939.096502] RAX: ffffffffffffffda RBX: 000055d769862b60 RCX: 00007f14f656f1dc
[  939.096504] RDX: 0000000000001000 RSI: 000055d7699b93c0 RDI: 0000000000000011
[  939.096506] RBP: 00007f14f66465e0 R08: 0000000000000000 R09: 00007f14f6649d30
[  939.096507] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f14f5a0d7f0
[  939.096508] R13: 0000000000000d68 R14: 00007f14f66459e0 R15: 0000000000000d68
[  939.096511]  </TASK>
[  939.096578] INFO: task systemd-logind:10117 blocked for more than 311 seconds.
[  939.096580]       Tainted: G           OE      6.1.0-22-2-amd64 #1 Debian 6.1.94-1
[  939.096581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  939.096582] task:systemd-logind  state:D stack:0     pid:10117 ppid:1      flags:0x00000002
[  939.096585] Call Trace:
[  939.096586]  <TASK>
[  939.096587]  __schedule+0x34d/0x9e0
[  939.096590]  schedule+0x5a/0xd0
[  939.096592]  schedule_timeout+0x118/0x150
[  939.096595]  __down_common+0x11a/0x230
[  939.096598]  ? device_match_of_node+0x20/0x20
[  939.096601]  down+0x43/0x60
[  939.096603]  console_lock+0x21/0x80
[  939.096606]  con_install+0x1b/0x140
[  939.096609]  tty_init_dev.part.0+0x45/0x220
[  939.096613]  tty_open+0x477/0x690
[  939.096615]  ? reconfigure_single+0x60/0x60
[  939.096619]  ? kobj_lookup+0xf1/0x160
[  939.096623]  chrdev_open+0xc4/0x250
[  939.096626]  ? __unregister_chrdev+0x50/0x50
[  939.096630]  do_dentry_open+0x1e5/0x410
[  939.096635]  path_openat+0xb7d/0x1260
[  939.096638]  ? memcg_list_lru_alloc+0xac/0x3d0
[  939.096643]  ? memcg_slab_post_alloc_hook+0x160/0x220
[  939.096647]  do_filp_open+0xaf/0x160
[  939.096653]  do_sys_openat2+0xaf/0x170
[  939.096655]  __x64_sys_openat+0x6a/0xa0
[  939.096657]  do_syscall_64+0x55/0xb0
[  939.096661]  ? audit_filter_inodes.part.0+0x2e/0x120
[  939.096665]  ? audit_reset_context+0x232/0x300
[  939.096667]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096670]  ? memcg_slab_post_alloc_hook+0x160/0x220
[  939.096672]  ? ep_ptable_queue_proc+0x2b/0x90
[  939.096677]  ? __seccomp_filter+0x32a/0x4e0
[  939.096682]  ? audit_filter_inodes.part.0+0x2e/0x120
[  939.096684]  ? audit_reset_context+0x232/0x300
[  939.096686]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096688]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096691]  ? do_syscall_64+0x61/0xb0
[  939.096695]  ? audit_filter_inodes.part.0+0x2e/0x120
[  939.096697]  ? audit_reset_context+0x232/0x300
[  939.096699]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096701]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096703]  ? do_syscall_64+0x61/0xb0
[  939.096707]  ? handle_mm_fault+0xdb/0x2d0
[  939.096711]  ? do_user_addr_fault+0x1b0/0x550
[  939.096714]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096716]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  939.096719] RIP: 0033:0x7ff95ca0ff01
[  939.096721] RSP: 002b:00007fff8acd3a10 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[  939.096723] RAX: ffffffffffffffda RBX: 0000000000080902 RCX: 00007ff95ca0ff01
[  939.096725] RDX: 0000000000080902 RSI: 000056199b48d0f0 RDI: 00000000ffffff9c
[  939.096726] RBP: 000056199b48d0f0 R08: 0000000000000007 R09: 000056199b48c3e0
[  939.096727] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[  939.096728] R13: 0000000000000000 R14: 0000000000000000 R15: 000056199b475990
[  939.096731]  </TASK>
[  939.096733] INFO: task runc:[2:INIT]:10258 blocked for more than 311 seconds.
[  939.096735]       Tainted: G           OE      6.1.0-22-2-amd64 #1 Debian 6.1.94-1
[  939.096736] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  939.096736] task:runc:[2:INIT]   state:D stack:0     pid:10258 ppid:10243  flags:0x00000002
[  939.096739] Call Trace:
[  939.096740]  <TASK>
[  939.096741]  __schedule+0x34d/0x9e0
[  939.096743]  schedule+0x5a/0xd0
[  939.096745]  schedule_preempt_disabled+0x11/0x20
[  939.096748]  __mutex_lock.constprop.0+0x399/0x700
[  939.096751]  ptmx_open+0x8b/0x190
[  939.096754]  chrdev_open+0xc4/0x250
[  939.096758]  ? __unregister_chrdev+0x50/0x50
[  939.096761]  do_dentry_open+0x1e5/0x410
[  939.096765]  path_openat+0xb7d/0x1260
[  939.096768]  ? audit_reset_context+0x232/0x300
[  939.096771]  do_filp_open+0xaf/0x160
[  939.096776]  do_sys_openat2+0xaf/0x170
[  939.096778]  __x64_sys_openat+0x6a/0xa0
[  939.096780]  do_syscall_64+0x55/0xb0
[  939.096784]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096786]  ? do_syscall_64+0x61/0xb0
[  939.096790]  ? __rseq_handle_notify_resume+0xa9/0x4a0
[  939.096795]  ? call_rcu+0xde/0x6b0
[  939.096800]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096802]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096804]  ? do_syscall_64+0x61/0xb0
[  939.096808]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096810]  ? do_syscall_64+0x61/0xb0
[  939.096813]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096816]  ? do_syscall_64+0x61/0xb0
[  939.096819]  ? do_syscall_64+0x61/0xb0
[  939.096822]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096824]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  939.096828] RIP: 0033:0x5570a91c42ce
[  939.096829] RSP: 002b:000000c00018f5f0 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[  939.096831] RAX: ffffffffffffffda RBX: ffffffffffffff9c RCX: 00005570a91c42ce
[  939.096832] RDX: 0000000000080102 RSI: 000000c00020ad60 RDI: ffffffffffffff9c
[  939.096834] RBP: 000000c00018f630 R08: 0000000000000000 R09: 0000000000000000
[  939.096835] R10: 0000000000000000 R11: 0000000000000202 R12: 000000c00020ad60
[  939.096836] R13: 0000000000000000 R14: 000000c0000061a0 R15: ffffffffffffffff
[  939.096838]  </TASK>
[  939.096844] INFO: task systemd-getty-g:11267 blocked for more than 311 seconds.
[  939.096845]       Tainted: G           OE      6.1.0-22-2-amd64 #1 Debian 6.1.94-1
[  939.096846] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  939.096847] task:systemd-getty-g state:D stack:0     pid:11267 ppid:1      flags:0x00000006
[  939.096849] Call Trace:
[  939.096850]  <TASK>
[  939.096851]  __schedule+0x34d/0x9e0
[  939.096853]  schedule+0x5a/0xd0
[  939.096855]  schedule_timeout+0x118/0x150
[  939.096858]  __down_common+0x11a/0x230
[  939.096861]  down+0x43/0x60
[  939.096864]  console_lock+0x21/0x80
[  939.096866]  show_cons_active+0x3f/0x190
[  939.096870]  dev_attr_show+0x18/0x40
[  939.096873]  sysfs_kf_seq_show+0xa3/0xe0
[  939.096875]  seq_read_iter+0x122/0x450
[  939.096880]  vfs_read+0x23c/0x310
[  939.096883]  ksys_read+0x6b/0xf0
[  939.096885]  do_syscall_64+0x55/0xb0
[  939.096889]  ? __vfs_getxattr+0x53/0x70
[  939.096893]  ? get_vfs_caps_from_disk+0x8a/0x220
[  939.096897]  ? audit_copy_inode+0xb3/0xf0
[  939.096899]  ? filename_lookup+0x18f/0x1f0
[  939.096904]  ? _copy_to_user+0x21/0x30
[  939.096908]  ? cp_new_stat+0x135/0x170
[  939.096913]  ? __do_sys_newfstatat+0x4e/0x80
[  939.096917]  ? mntput_no_expire+0x4a/0x250
[  939.096920]  ? audit_reset_context+0x232/0x300
[  939.096923]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096924]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096927]  ? do_syscall_64+0x61/0xb0
[  939.096931]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096933]  ? do_syscall_64+0x61/0xb0
[  939.096937]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.096938]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.096941]  ? do_syscall_64+0x61/0xb0
[  939.096944]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  939.096948] RIP: 0033:0x7f23d7f6e19d
[  939.096949] RSP: 002b:00007ffc80af97c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  939.096951] RAX: ffffffffffffffda RBX: 0000555df4e1b2d0 RCX: 00007f23d7f6e19d
[  939.096952] RDX: 0000000000001000 RSI: 0000555df4e1ba20 RDI: 0000000000000004
[  939.096953] RBP: 00007f23d80455e0 R08: 0000000000000008 R09: 0000000000000001
[  939.096954] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f23d74252f0
[  939.096955] R13: 0000000000000d68 R14: 00007f23d80449e0 R15: 0000000000000d68
[  939.096958]  </TASK>
[  939.096962] INFO: task runc:[2:INIT]:11435 blocked for more than 311 seconds.
[  939.096963]       Tainted: G           OE      6.1.0-22-2-amd64 #1 Debian 6.1.94-1
[  939.096964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  939.096965] task:runc:[2:INIT]   state:D stack:0     pid:11435 ppid:11389  flags:0x00000002
[  939.096967] Call Trace:
[  939.096968]  <TASK>
[  939.096969]  __schedule+0x34d/0x9e0
[  939.096972]  schedule+0x5a/0xd0
[  939.096974]  schedule_preempt_disabled+0x11/0x20
[  939.096976]  __mutex_lock.constprop.0+0x399/0x700
[  939.096979]  ptmx_open+0x8b/0x190
[  939.096981]  chrdev_open+0xc4/0x250
[  939.096985]  ? __unregister_chrdev+0x50/0x50
[  939.096988]  do_dentry_open+0x1e5/0x410
[  939.096992]  path_openat+0xb7d/0x1260
[  939.096996]  do_filp_open+0xaf/0x160
[  939.097001]  do_sys_openat2+0xaf/0x170
[  939.097003]  __x64_sys_openat+0x6a/0xa0
[  939.097005]  do_syscall_64+0x55/0xb0
[  939.097009]  ? audit_filter_inodes.part.0+0x2e/0x120
[  939.097011]  ? audit_reset_context+0x232/0x300
[  939.097013]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.097015]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.097018]  ? do_syscall_64+0x61/0xb0
[  939.097021]  ? __do_sys_newfstat+0x6b/0x80
[  939.097025]  ? audit_filter_inodes.part.0+0x2e/0x120
[  939.097027]  ? audit_reset_context+0x232/0x300
[  939.097029]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.097031]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.097033]  ? do_syscall_64+0x61/0xb0
[  939.097037]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.097039]  ? do_syscall_64+0x61/0xb0
[  939.097043]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  939.097046] RIP: 0033:0x55e98173d2ce
[  939.097047] RSP: 002b:000000c00018f5f0 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[  939.097049] RAX: ffffffffffffffda RBX: ffffffffffffff9c RCX: 000055e98173d2ce
[  939.097050] RDX: 0000000000080102 RSI: 000000c000208da0 RDI: ffffffffffffff9c
[  939.097052] RBP: 000000c00018f630 R08: 0000000000000000 R09: 0000000000000000
[  939.097053] R10: 0000000000000000 R11: 0000000000000202 R12: 000000c000208da0
[  939.097054] R13: 0000000000000000 R14: 000000c0000061a0 R15: ffffffffffffffff
[  939.097056]  </TASK>
[  939.097057] INFO: task runc:[2:INIT]:11436 blocked for more than 311 seconds.
[  939.097058]       Tainted: G           OE      6.1.0-22-2-amd64 #1 Debian 6.1.94-1
[  939.097059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  939.097060] task:runc:[2:INIT]   state:D stack:0     pid:11436 ppid:11426  flags:0x00000002
[  939.097062] Call Trace:
[  939.097063]  <TASK>
[  939.097064]  __schedule+0x34d/0x9e0
[  939.097066]  schedule+0x5a/0xd0
[  939.097068]  schedule_preempt_disabled+0x11/0x20
[  939.097070]  __mutex_lock.constprop.0+0x399/0x700
[  939.097073]  ptmx_open+0x8b/0x190
[  939.097075]  chrdev_open+0xc4/0x250
[  939.097079]  ? __unregister_chrdev+0x50/0x50
[  939.097082]  do_dentry_open+0x1e5/0x410
[  939.097085]  path_openat+0xb7d/0x1260
[  939.097089]  do_filp_open+0xaf/0x160
[  939.097094]  do_sys_openat2+0xaf/0x170
[  939.097096]  __x64_sys_openat+0x6a/0xa0
[  939.097098]  do_syscall_64+0x55/0xb0
[  939.097102]  ? get_vfs_caps_from_disk+0x8a/0x220
[  939.097107]  ? ext4_getattr+0x98/0x160 [ext4]
[  939.097145]  ? ovl_getattr+0xa1/0x3d0 [overlay]
[  939.097157]  ? _copy_to_user+0x21/0x30
[  939.097161]  ? cp_new_stat+0x135/0x170
[  939.097166]  ? __do_sys_newfstatat+0x4e/0x80
[  939.097170]  ? mntput_no_expire+0x4a/0x250
[  939.097173]  ? audit_reset_context+0x232/0x300
[  939.097175]  ? exit_to_user_mode_prepare+0x44/0x1f0
[  939.097178]  ? syscall_exit_to_user_mode+0x1e/0x40
[  939.097181]  ? do_syscall_64+0x61/0xb0
[  939.097184]  ? do_syscall_64+0x61/0xb0
[  939.097188]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  939.097192] RIP: 0033:0x5613ec4ec2ce
[  939.097193] RSP: 002b:000000c0001af5f0 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[  939.097195] RAX: ffffffffffffffda RBX: ffffffffffffff9c RCX: 00005613ec4ec2ce
[  939.097197] RDX: 0000000000080102 RSI: 000000c000206aa0 RDI: ffffffffffffff9c
[  939.097198] RBP: 000000c0001af630 R08: 0000000000000000 R09: 0000000000000000
[  939.097200] R10: 0000000000000000 R11: 0000000000000202 R12: 000000c000206aa0
[  939.097201] R13: 0000000000000000 R14: 000000c0000061a0 R15: ffffffffffffffff
[  939.097204]  </TASK>
[  939.097207] Kernel panic - not syncing: hung_task: blocked tasks
[  939.097209] CPU: 11 PID: 89 Comm: khungtaskd Tainted: G           OE      6.1.0-22-2-amd64 #1  Debian 6.1.94-1
[  939.097212] Hardware name: Intel Camelback Mountain CRB/Camelback Mountain CRB, BIOS Aboot-norcal7-7.1.4-14169220 11/09/2019
[  939.097213] Call Trace:
[  939.097214]  <TASK>
[  939.097215]  dump_stack_lvl+0x44/0x5c
[  939.097219]  panic+0x118/0x2f4
[  939.097223]  watchdog.cold+0xc/0xbb
[  939.097228]  ? proc_dohung_task_timeout_secs+0x30/0x30
[  939.097233]  kthread+0xda/0x100
[  939.097236]  ? kthread_complete_and_exit+0x20/0x20
[  939.097239]  ret_from_fork+0x22/0x30
[  939.097245]  </TASK>
[  940.130309] Shutting down cpus with NMI
[  940.130315] Kernel Offset: 0xa200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Aboot 7.1.4-14169220

Press Control-C now to enter Aboot shell
Booting flash:image-20240531.10/.sonic-boot.swi
Secure Boot disabled, skipping check
TPM: failed to raise GPIO line
SPI flash hardware write protection disabled
19.46: Using previously installed image
19.52: Loading extra kernel parameters from kernel-params
19.58: Next reboot will use flash:image-20240531.10/.sonic-boot.swi
20.16: Kexecing...
[   20.335936] Starting new kernel
[    1.091971] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
setfacl: /mnt/root-convfs/aquota.user: Operation not permitted
tune2fs 1.47.0 (5-Feb-2023)
Setting reserved blocks percentage to 0% (0 blocks)
Setting reserved blocks count to 0

after this the linecards come up with the new image but the minigraph/config is not retained from the previous image

admin@str2-7804-sup-1:~$ ssh 127.100.5.1
The authenticity of host '127.100.5.1 (127.100.5.1)' can't be established.
RSA key fingerprint is SHA256:qPr4CpwSSNyWk3aLhJxGXBQ2683BT0xWilzHnH+veE4.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '127.100.5.1' (RSA) to the list of known hosts.
Debian GNU/Linux 12 \n \l

admin@127.100.5.1's password:
Linux sonic 6.1.0-22-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64
You are on
  ____   ___  _   _ _  ____
 / ___| / _ \| \ | (_)/ ___|
 \___ \| | | |  \| | | |
  ___) | |_| | |\  | | |___
 |____/ \___/|_| \_|_|\____|

-- Software for Open Networking in the Cloud --

Unauthorized access and/or use are prohibited.
All access and/or use are subject to monitoring.

Help:      https://sonic-net.github.io/SONiC/
Wiki:      https://microsoft.sharepoint.com/teams/WAG/AzureNetworking/Wiki/SONiC.aspx
On-Call:   https://portal.microsofticm.com/imp/v3/oncall/current?serviceId=10045&teamIds=26162
Dashboard: https://aka.ms/sonic-dri
Contact:   sonicdev@microsoft.com

Last login: Sat Nov  2 01:34:30 2024 from 10.1.84.57
admin@sonic:~$ show chassis module status
Key * not found in CHASSIS_MODULE_TABLE table
admin@sonic:~$ ls /etc/sonic/
asic_config_checksum   fast-reboot_order        snmp.yml
constants.yml          frr                      sonic-environment
copp_cfg.json          generated_services.conf  sonic_release
core_analyzer.rc.json  grpc_secrets.json        sonic_version.yml
credentials            init_cfg.json            swss_dependent
default_users.json     macsec_reconcile         warm-reboot_order
dhcp_relay_reconcile   old_config
dhcp_server_reconcile  remote_ctr.config.json
admin@sonic:~$ show vers

SONiC Software Version: SONiC.20240531.10
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-22-2-amd64
Build commit: cf7d8848af
Build date: Fri Nov  1 09:34:37 UTC 2024
Built by: azureuser@53ad391cc000000

Platform: x86_64-arista_7800r3a_36dm2_lc
HwSKU: None
ASIC: broadcom
ASIC Count: 2
Serial Number: SGD21190878
Model Number: 7800R3A-36DM2-LC
Hardware Revision: 2a.00
Uptime: 02:16:07 up 5 min,  1 user,  load average: 0.63, 1.32, 0.71
Date: Sat 02 Nov 2024 02:16:07

Looks like the linecards are provisioned again.

arlakshm commented 5 hours ago

add arista_logs arista_logs.tar.gz

arlakshm commented 5 hours ago

add dmesg.log