aristanetworks / sonic

Open source drivers and initialization library for Arista platforms running SONiC
GNU General Public License v2.0
25 stars 30 forks source link

[wolverine LC] orchagent high CPU usage when startup/shutdown bgp sessions #70

Closed wenyiz2021 closed 1 year ago

wenyiz2021 commented 1 year ago

here are some orchagent CPU usage when starting up/shutting down bgp sessions (bold) :

admin@str2-7804-lc5-1:~$ show processes cpu | grep orch 50337 root 20 0 935320 553568 15432 R 29.4 3.4 5:45.38 orchagent 50327 root 20 0 945600 564060 15520 S 0.0 3.5 6:02.71 orchagent admin@str2-7804-lc5-1:~$ show processes cpu | grep orch 50327 root 20 0 945600 564060 15520 S 5.9 3.5 6:03.16 orchagent 50337 root 20 0 935320 553568 15432 S 0.0 3.4 5:45.71 orchagent admin@str2-7804-lc5-1:~$ show processes cpu | grep orch 50337 root 20 0 935320 553568 15432 S 62.5 3.4 5:45.93 orchagent 50327 root 20 0 945600 564060 15520 S 0.0 3.5 6:03.37 orchagent admin@str2-7804-lc5-1:~$ show processes cpu | grep orch 50337 root 20 0 935320 553568 15432 S 68.8 3.4 5:46.44 orchagent 50327 root 20 0 945600 564060 15520 S 25.0 3.5 6:03.58 orchagent admin@str2-7804-lc5-1:~$ show processes cpu | grep orch 50337 root 20 0 935320 553568 15432 S 5.9 3.4 5:48.07 orchagent 50327 root 20 0 945600 564060 15520 S 0.0 3.5 6:04.09 orchagent admin@str2-7804-lc5-1:~$ show processes cpu | grep orch 50337 root 20 0 935320 553568 15432 S 18.8 3.4 5:48.32 orchagent 50327 root 20 0 945600 564060 15520 S 12.5 3.5 6:04.54 orchagent

while at the same time, syslog shows orchagent is busy:

Jan 25 20:03:08.154752 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.49@Ethernet-IB0,10.0.0.51@Ethernet-IB0,10.0.0.53@Ethernet-IB0,10.0.0.55@Ethernet-IB0
Jan 25 20:03:09.642076 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group fc00::56@Ethernet-IB0,fc00::5a@Ethernet-IB0,fc00::5e@Ethernet-IB0,fc00::72@Ethernet-IB0,fc00::76@Ethernet-IB0,fc00::7a@Ethernet-IB0,fc00::7e@Ethernet-IB0
Jan 25 20:03:09.739240 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group fc00::72@Ethernet-IB0,fc00::76@Ethernet-IB0,fc00::7a@Ethernet-IB0,fc00::7e@Ethernet-IB0
Jan 25 20:03:09.789258 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group fc00::5e@Ethernet-IB0,fc00::72@Ethernet-IB0,fc00::76@Ethernet-IB0,fc00::7a@Ethernet-IB0,fc00::7e@Ethernet-IB0
Jan 25 20:03:09.911725 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group fc00::4a@Ethernet-IB0,fc00::4e@Ethernet-IB0,fc00::52@Ethernet-IB0,fc00::56@Ethernet-IB0,fc00::5a@Ethernet-IB0,fc00::5e@Ethernet-IB0,fc00::72@Ethernet-IB0,fc00::76@Ethernet-IB0,fc00::7a@Ethernet-IB0,fc00::7e@Ethernet-IB0
Jan 25 20:03:10.819396 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group fc00::5a@Ethernet-IB0,fc00::5e@Ethernet-IB0,fc00::72@Ethernet-IB0,fc00::76@Ethernet-IB0,fc00::7a@Ethernet-IB0,fc00::7e@Ethernet-IB0
Jan 25 20:03:11.059602 str2-7804-lc5-1 NOTICE swss1#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.47@Ethernet184,10.0.0.57@Ethernet224,10.0.0.59@Ethernet232,10.0.0.61@Ethernet240,10.0.0.63@Ethernet248
Jan 25 20:03:11.109587 str2-7804-lc5-1 NOTICE swss1#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.45@Ethernet176,10.0.0.47@Ethernet184,10.0.0.57@Ethernet224,10.0.0.59@Ethernet232,10.0.0.61@Ethernet240,10.0.0.63@Ethernet248

on platform Arista-7800R3-48CQM2-C48, orchagent isn't busy during bgp shutdown:

admin@str2-7804-lc6-1:~$ show processes cpu | grep orch
 421252 root      20   0 1025876 570196  15448 S   0.0   3.5   6:42.53 orchagent
admin@str2-7804-lc6-1:~$ show processes cpu | grep orch
 421252 root      20   0 1025876 570196  15448 S   0.0   3.5   6:42.53 orchagent
admin@str2-7804-lc6-1:~$ 
admin@str2-7804-lc6-1:~$ show processes cpu | grep orch
 421252 root      20   0 1025876 570196  15448 S   0.0   3.5   6:42.53 orchagent
admin@str2-7804-lc6-1:~$ show processes cpu | grep orch
 421252 root      20   0 1025876 570196  15448 S   6.7   3.5   6:42.54 orchagent
admin@str2-7804-lc6-1:~$ show processes cpu | grep orch
 421252 root      20   0 1025876 570196  15448 S   0.0   3.5   6:42.54 orchagent
admin@str2-7804-lc6-1:~$ show processes cpu | grep orch
 421252 root      20   0 1025876 570196  15448 S   0.0   3.5   6:42.54 orchagent
wenyiz2021 commented 1 year ago

@arlakshm @Staphylo @kenneth-arista

wenyiz2021 commented 1 year ago

syslog from healthy linecard:

an 25 20:10:46.889424 str2-7804-lc6-1 INFO gbsyncd#supervisord: syncd [OWL][
Jan 25 20:10:46.944488 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.101@Ethernet72,10.0.0.129@Ethernet-IB0,10.0.0.133@Ethernet-IB0,10.0.0.137@Ethernet-IB0,10.0.0.141@Ethernet-IB0,10.0.0.145@Ethernet-IB0,10.0.0.149@Ethernet-IB0,10.0.0.153@Ethernet-IB0,10.0.0.157@Ethernet-IB0,10.0.0.161@Ethernet-IB0,10.0.0.163@Ethernet-IB0,10.0.0.165@Ethernet-IB0,10.0.0.167@Ethernet-IB0,10.0.0.169@Ethernet-IB0,10.0.0.171@Ethernet-IB0,10.0.0.173@Ethernet-IB0,10.0.0.175@Ethernet-IB0,10.0.0.177@Ethernet-IB0,10.0.0.179@Ethernet-IB0,10.0.0.181@Ethernet-IB0,10.0.0.183@Ethernet-IB0,10.0.0.185@Ethernet-IB0,10.0.0.187@Ethernet-IB0,10.0.0.189@Ethernet-IB0,10.0.0.191@Ethernet-IB0
Jan 25 20:10:46.992982 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.101@Ethernet72,10.0.0.157@Ethernet-IB0,10.0.0.163@Ethernet-IB0,10.0.0.165@Ethernet-IB0,10.0.0.167@Ethernet-IB0,10.0.0.169@Ethernet-IB0,10.0.0.171@Ethernet-IB0,10.0.0.173@Ethernet-IB0,10.0.0.175@Ethernet-IB0,10.0.0.177@Ethernet-IB0,10.0.0.179@Ethernet-IB0,10.0.0.181@Ethernet-IB0,10.0.0.183@Ethernet-IB0,10.0.0.185@Ethernet-IB0,10.0.0.187@Ethernet-IB0,10.0.0.189@Ethernet-IB0,10.0.0.191@Ethernet-IB0
Jan 25 20:10:47.004172 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.129@Ethernet-IB0,10.0.0.133@Ethernet-IB0,10.0.0.137@Ethernet-IB0,10.0.0.141@Ethernet-IB0,10.0.0.145@Ethernet-IB0,10.0.0.149@Ethernet-IB0,10.0.0.153@Ethernet-IB0,10.0.0.157@Ethernet-IB0,10.0.0.161@Ethernet-IB0,10.0.0.163@Ethernet-IB0,10.0.0.165@Ethernet-IB0,10.0.0.167@Ethernet-IB0,10.0.0.169@Ethernet-IB0,10.0.0.171@Ethernet-IB0,10.0.0.173@Ethernet-IB0,10.0.0.175@Ethernet-IB0,10.0.0.177@Ethernet-IB0,10.0.0.179@Ethernet-IB0,10.0.0.181@Ethernet-IB0,10.0.0.183@Ethernet-IB0,10.0.0.185@Ethernet-IB0,10.0.0.187@Ethernet-IB0,10.0.0.189@Ethernet-IB0,10.0.0.191@Ethernet-IB0
Jan 25 20:10:47.020155 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.157@Ethernet-IB0,10.0.0.163@Ethernet-IB0,10.0.0.165@Ethernet-IB0,10.0.0.167@Ethernet-IB0,10.0.0.169@Ethernet-IB0,10.0.0.171@Ethernet-IB0,10.0.0.173@Ethernet-IB0,10.0.0.175@Ethernet-IB0,10.0.0.177@Ethernet-IB0,10.0.0.179@Ethernet-IB0,10.0.0.181@Ethernet-IB0,10.0.0.183@Ethernet-IB0,10.0.0.185@Ethernet-IB0,10.0.0.187@Ethernet-IB0,10.0.0.189@Ethernet-IB0,10.0.0.191@Ethernet-IB0
Jan 25 20:10:47.085462 str2-7804-lc6-1 NOTICE swss#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.93@PortChannel1019,10.0.0.97@Ethernet48,10.0.0.99@Ethernet60,10.0.0.175@Ethernet-IB0,10.0.0.177@Ethernet-IB0
Jan 25 20:10:47.089709 str2-7804-lc6-1 NOTICE swss#orchagent: :- addNextHopGroup: Create next hop group fc00::82@PortChannel101,fc00::8a@PortChannel103,fc00::92@PortChannel105,fc00::9a@PortChannel107,fc00::a2@PortChannel109,fc00::aa@PortChannel1011,fc00::b2@PortChannel1015,fc00::c2@Ethernet48,fc00::102@Ethernet-IB0,fc00::10a@Ethernet-IB0,fc00::112@Ethernet-IB0,fc00::11a@Ethernet-IB0,fc00::122@Ethernet-IB0,fc00::12a@Ethernet-IB0,fc00::132@Ethernet-IB0,fc00::142@Ethernet-IB0
Jan 25 20:10:47.113532 str2-7804-lc6-1 NOTICE swss#orchagent: :- addNextHopGroup: Create next hop group fc00::ba@PortChannel1019,fc00::c6@Ethernet60,fc00::ca@Ethernet72,fc00::ce@Ethernet76,fc00::d2@Ethernet80,fc00::d6@Ethernet84,fc00::da@Ethernet88,fc00::de@Ethernet92,fc00::e2@Ethernet96,fc00::e6@Ethernet100,fc00::ea@Ethernet104,fc00::ee@Ethernet108,fc00::f2@Ethernet112,fc00::f6@Ethernet116,fc00::fa@Ethernet120,fc00::fe@Ethernet124,fc00::13a@Ethernet-IB0,fc00::146@Ethernet-IB0,fc00::14a@Ethernet-IB0,fc00::14e@Ethernet-IB0,fc00::152@Ethernet-IB0,fc00::156@Ethernet-IB0,fc00::15a@Ethernet-IB0,fc00::15e@Ethernet-IB0,fc00::162@Ethernet-IB0,fc00::166@Ethernet-IB0,fc00::16a@Ethernet-IB0,fc00::16e@Ethernet-IB0,fc00::172@Ethernet-IB0,fc00::176@Ethernet-IB0,fc00::17a@Ethernet-IB0,fc00::17e@Ethernet-IB0
Jan 25 20:10:47.154309 str2-7804-lc6-1 NOTICE swss#orchagent: :- addNextHopGroup: Create next hop group fc00::82@PortChannel101,fc00::8a@PortChannel103,fc00::92@PortChannel105,fc00::9a@PortChannel107,fc00::a2@PortChannel109,fc00::aa@PortChannel1011,fc00::b2@PortChannel1015,fc00::ba@PortChannel1019,fc00::c2@Ethernet48,fc00::c6@Ethernet60,fc00::ca@Ethernet72,fc00::ce@Ethernet76,fc00::d2@Ethernet80,fc00::d6@Ethernet84,fc00::da@Ethernet88,fc00::de@Ethernet92,fc00::e2@Ethernet96,fc00::e6@Ethernet100,fc00::ea@Ethernet104,fc00::ee@Ethernet108,fc00::f2@Ethernet112,fc00::f6@Ethernet116,fc00::fa@Ethernet120,fc00::fe@Ethernet124,fc00::102@Ethernet-IB0,fc00::10a@Ethernet-IB0,fc00::112@Ethernet-IB0,fc00::11a@Ethernet-IB0,fc00::122@Ethernet-IB0,fc00::12a@Ethernet-IB0,fc00::132@Ethernet-IB0,fc00::13a@Ethernet-IB0,fc00::142@Ethernet-IB0,fc00::146@Ethernet-IB0,fc00::14a@Ethernet-IB0,fc00::14e@Ethernet-IB0,fc00::152@Ethernet-IB0,fc00::156@Ethernet-IB0,fc00::15a@Ethernet-IB0,fc00::15e@Ethernet-IB0,fc00::162@Ethernet-IB0,fc00::166@Ethernet-IB0,fc00::16a@Ethernet-IB0,fc00::16e@Ethernet-IB0,fc00::172@Ethernet-IB0,fc00::176@Ethernet-IB0,fc00::17a@Ethernet-IB0,fc00::17e@Ethernet-IB0
Jan 25 20:10:47.262558 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.129@Ethernet-IB0,10.0.0.133@Ethernet-IB0,10.0.0.137@Ethernet-IB0,10.0.0.141@Ethernet-IB0,10.0.0.145@Ethernet-IB0,10.0.0.149@Ethernet-IB0,10.0.0.153@Ethernet-IB0,10.0.0.161@Ethernet-IB0
Jan 25 20:10:47.266316 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.175@Ethernet-IB0,10.0.0.177@Ethernet-IB0
Jan 25 20:10:47.267640 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group fc00::102@Ethernet-IB0,fc00::10a@Ethernet-IB0,fc00::112@Ethernet-IB0,fc00::11a@Ethernet-IB0,fc00::122@Ethernet-IB0,fc00::12a@Ethernet-IB0,fc00::132@Ethernet-IB0,fc00::13a@Ethernet-IB0,fc00::142@Ethernet-IB0,fc00::146@Ethernet-IB0,fc00::14a@Ethernet-IB0,fc00::14e@Ethernet-IB0,fc00::152@Ethernet-IB0,fc00::156@Ethernet-IB0,fc00::15a@Ethernet-IB0,fc00::15e@Ethernet-IB0,fc00::162@Ethernet-IB0,fc00::166@Ethernet-IB0,fc00::16a@Ethernet-IB0,fc00::16e@Ethernet-IB0,fc00::172@Ethernet-IB0,fc00::176@Ethernet-IB0,fc00::17a@Ethernet-IB0,fc00::17e@Ethernet-IB0
Jan 25 20:10:47.274007 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group fc00::102@Ethernet-IB0,fc00::10a@Ethernet-IB0,fc00::112@Ethernet-IB0,fc00::11a@Ethernet-IB0,fc00::122@Ethernet-IB0,fc00::12a@Ethernet-IB0,fc00::132@Ethernet-IB0,fc00::142@Ethernet-IB0
Jan 25 20:10:47.276512 str2-7804-lc6-1 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group fc00::13a@Ethernet-IB0,fc00::146@Ethernet-IB0,fc00::14a@Ethernet-IB0,fc00::14e@Ethernet-IB0,fc00::152@Ethernet-IB0,fc00::156@Ethernet-IB0,fc00::15a@Ethernet-IB0,fc00::15e@Ethernet-IB0,fc00::162@Ethernet-IB0,fc00::166@Ethernet-IB0,fc00::16a@Ethernet-IB0,fc00::16e@Ethernet-IB0,fc00::172@Ethernet-IB0,fc00::176@Ethernet-IB0,fc00::17a@Ethernet-IB0,fc00::17e@Ethernet-IB0
Jan 25 20:10:48.092159 str2-7804-lc6-1 INFO gbsyncd#supervisord: syncd OWL_get_fec_statistics]
Jan 25 20:10:48.092159 str2-7804-lc6-1 INFO gbsyncd#supervisord: syncd [Slice 4] [Port query][2] Response 0x0A9E, detail1 0x0593, detail2 0x0114

OA took ~1sec to do add/remove next hop

wenyiz2021 commented 1 year ago

on this linecard:

Jan 25 20:14:20.144283 str2-7804-lc5-1 NOTICE syncd1#syncd: :- inc: 10634 (calls 11) Syncd::syncUpdateRedisBulkQuadEvent op took: 492 ms
Jan 25 20:14:20.707857 str2-7804-lc5-1 NOTICE syncd0#syncd: :- threadFunction: time span 1 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
Jan 25 20:14:20.985526 str2-7804-lc5-1 NOTICE swss1#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.25@Ethernet-IB1,10.0.0.37@Ethernet144,10.0.0.39@Ethernet152,10.0.0.41@Ethernet160,10.0.0.43@Ethernet168,10.0.0.45@Ethernet176,10.0.0.47@Ethernet184,10.0.0.57@Ethernet224,10.0.0.59@Ethernet232,10.0.0.61@Ethernet240,10.0.0.63@Ethernet248
Jan 25 20:14:23.563399 str2-7804-lc5-1 NOTICE swss1#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.5@Ethernet-IB1,10.0.0.9@Ethernet-IB1,10.0.0.29@Ethernet-IB1,10.0.0.33@Ethernet-IB1,10.0.0.35@Ethernet-IB1,10.0.0.49@Ethernet192,10.0.0.51@Ethernet200,10.0.0.53@Ethernet208,10.0.0.55@Ethernet216
Jan 25 20:14:23.618707 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.5@PortChannel104,10.0.0.9@PortChannel106,10.0.0.29@PortChannel1016,10.0.0.33@Ethernet96,10.0.0.35@Ethernet120,10.0.0.49@Ethernet-IB0,10.0.0.51@Ethernet-IB0,10.0.0.53@Ethernet-IB0,10.0.0.55@Ethernet-IB0
Jan 25 20:14:23.709719 str2-7804-lc5-1 NOTICE syncd0#syncd: :- threadFunction: time span 68 ms for 'bulkcreate:SAI_OBJECT_TYPE_NEXT_HOP_GROUP_MEMBER:9'
Jan 25 20:14:23.729444 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.39@Ethernet-IB0,10.0.0.41@Ethernet-IB0,10.0.0.43@Ethernet-IB0,10.0.0.45@Ethernet-IB0,10.0.0.47@Ethernet-IB0,10.0.0.57@Ethernet-IB0,10.0.0.59@Ethernet-IB0,10.0.0.61@Ethernet-IB0,10.0.0.63@Ethernet-IB0
Jan 25 20:14:24.707721 str2-7804-lc5-1 NOTICE syncd1#syncd: :- threadFunction: time span 112 ms for 'bulkremove:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
Jan 25 20:14:25.111721 str2-7804-lc5-1 NOTICE syncd0#syncd: :- inc: 10209 (calls 14) Syncd::syncUpdateRedisBulkQuadEvent op took: 504 ms
Jan 25 20:14:25.560117 str2-7804-lc5-1 NOTICE syncd1#syncd: :- inc: 10862 (calls 14) Syncd::syncUpdateRedisBulkQuadEvent op took: 598 ms
Jan 25 20:14:25.711381 str2-7804-lc5-1 NOTICE syncd0#syncd: :- threadFunction: time span 61 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
Jan 25 20:14:25.711735 str2-7804-lc5-1 NOTICE syncd1#syncd: :- threadFunction: time span 122 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
Jan 25 20:14:26.440324 str2-7804-lc5-1 NOTICE syncd0#syncd: :- inc: 10000 (calls 10) Syncd::syncUpdateRedisBulkQuadEvent op took: 421 ms
Jan 25 20:14:26.715797 str2-7804-lc5-1 NOTICE syncd0#syncd: :- threadFunction: time span 8 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
Jan 25 20:14:26.949485 str2-7804-lc5-1 NOTICE syncd1#syncd: :- inc: 10000 (calls 10) Syncd::syncUpdateRedisBulkQuadEvent op took: 468 ms
Jan 25 20:14:27.627527 str2-7804-lc5-1 NOTICE swss0#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.5@PortChannel104,10.0.0.9@PortChannel106,10.0.0.17@PortChannel1010,10.0.0.21@PortChannel1012,10.0.0.29@PortChannel1016,10.0.0.33@Ethernet96,10.0.0.35@Ethernet120,10.0.0.49@Ethernet-IB0,10.0.0.51@Ethernet-IB0,10.0.0.53@Ethernet-IB0,10.0.0.55@Ethernet-IB0
Jan 25 20:14:27.701213 str2-7804-lc5-1 NOTICE swss1#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.1@Ethernet-IB1,10.0.0.5@Ethernet-IB1,10.0.0.9@Ethernet-IB1,10.0.0.17@Ethernet-IB1,10.0.0.21@Ethernet-IB1,10.0.0.29@Ethernet-IB1,10.0.0.33@Ethernet-IB1,10.0.0.35@Ethernet-IB1,10.0.0.49@Ethernet192,10.0.0.51@Ethernet200,10.0.0.53@Ethernet208,10.0.0.55@Ethernet216
Jan 25 20:14:27.715832 str2-7804-lc5-1 NOTICE syncd0#syncd: :- threadFunction: time span 75 ms for 'bulkremove:SAI_OBJECT_TYPE_NEXT_HOP_GROUP_MEMBER:11'
Jan 25 20:14:27.764990 str2-7804-lc5-1 NOTICE swss0#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.13@PortChannel108,10.0.0.25@PortChannel1020,10.0.0.37@Ethernet-IB0,10.0.0.39@Ethernet-IB0,10.0.0.41@Ethernet-IB0,10.0.0.43@Ethernet-IB0,10.0.0.45@Ethernet-IB0,10.0.0.47@Ethernet-IB0,10.0.0.57@Ethernet-IB0,10.0.0.59@Ethernet-IB0,10.0.0.61@Ethernet-IB0,10.0.0.63@Ethernet-IB0
Jan 25 20:14:27.851127 str2-7804-lc5-1 NOTICE swss1#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.13@Ethernet-IB1,10.0.0.25@Ethernet-IB1,10.0.0.37@Ethernet144,10.0.0.39@Ethernet152,10.0.0.41@Ethernet160,10.0.0.43@Ethernet168,10.0.0.45@Ethernet176,10.0.0.47@Ethernet184,10.0.0.57@Ethernet224,10.0.0.59@Ethernet232,10.0.0.61@Ethernet240,10.0.0.63@Ethernet248
Jan 25 20:14:33.128703 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.49@Ethernet-IB0,10.0.0.51@Ethernet-IB0,10.0.0.53@Ethernet-IB0,10.0.0.55@Ethernet-IB0
Jan 25 20:14:33.192886 str2-7804-lc5-1 NOTICE swss1#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.43@Ethernet168,10.0.0.45@Ethernet176,10.0.0.47@Ethernet184,10.0.0.57@Ethernet224,10.0.0.59@Ethernet232,10.0.0.61@Ethernet240,10.0.0.63@Ethernet248
Jan 25 20:14:33.912641 str2-7804-lc5-1 NOTICE swss1#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.45@Ethernet176,10.0.0.47@Ethernet184,10.0.0.57@Ethernet224,10.0.0.59@Ethernet232,10.0.0.61@Ethernet240,10.0.0.63@Ethernet248
Jan 25 20:14:34.763126 str2-7804-lc5-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group fc00::a@PortChannel104,fc00::3a@PortChannel1016,fc00::42@Ethernet96,fc00::46@Ethernet120,fc00::62@Ethernet-IB0,fc00::66@Ethernet-IB0,fc00::6a@Ethernet-IB0,fc00::6e@Ethernet-IB0

similar operation took ~14sec

kenneth-arista commented 1 year ago

Did you test with https://github.com/sonic-net/sonic-buildimage/pull/13429 to reduce the MHz of syncd?

wenyiz2021 commented 1 year ago

Did you test with sonic-net/sonic-buildimage#13429 to reduce the MHz of syncd?

will double check with this PR patch

wenyiz2021 commented 1 year ago

@Staphylo @kenneth-arista can you please provide an ETA for this issue? it's still seen with our latest build thanks!

kenneth-arista commented 1 year ago

We need more time to understand what is contributing to the high CPU usage in orchagent and the variation seen between different LC SKUs. We're building debug images to run perf.

ysmanman commented 1 year ago

HI @wenyiz2021, in T2 topo, the LC connected to upstreams ports learns much more ebgp routes than LCs connected to downstream ports (see the output from our testbed below). So if we shutdown bgp on LC connected to upstream ports, the operation should take much longer since there are more routes to process. In your setup, I think lc5 is the one connected to upstream ports, right? If you could swap lc5 & lc6, you will likely observe the same result.

These are the V4 route scale from one of our testbed:

On (multi-asic) LC connected to upstream ports:

admin@cmp227-4:~$ show ip route summary                                                                                   
asic0:                                                                                                                    
Route Source         Routes               FIB  (vrf default)                                                              
kernel               64                   64                                                                              
connected            13                   13                                                                              
ebgp                 50827                50827                                                                           
ibgp                 514                  514                                                                             
------                                                                                                                    
Totals               51418                51418                                                                           

asic1:                                                                                                                    
Route Source         Routes               FIB  (vrf default)                                                              
kernel               59                   59                                                                              
connected            17                   17                                                                              
ebgp                 34063                34063                                                                           
ibgp                 17274                17274                                                                           
------                                                                                                                    
Totals               51413                51413

On (single asic) LC connected to downstream ports:

admin@cmp227-5:$ show ip rout sum                                                                                        
:
Route Source         Routes               FIB  (vrf default)                                                              
kernel               51                   51                                                                              
connected            28                   28                                                                              
ebgp                 409                  409                                                                             
ibgp                 50918                50918                                                                           
------                                                                                                                    
Totals               51406                51406                                                                           
wenyiz2021 commented 1 year ago

hi @ysmanman I could understand the doubled routes on this uplink card and the time. just double checked the other chassis doesn't have this issue on uplink multi-asic linecard (51156+51152 routes on 2 asics). still feel wolverine card takes too much CPU usage (as high as ~68% compared to 10% limit) let me do some experiment on the test and meanwhile would you check any other interesting syslogs when shutting down bgp sessions on wolverine LC?

kartik-arista commented 1 year ago

Hi @wenyiz2021 - This has very little to do with multi-asic. We saw similar behavior with a box that has a single ASIC uplink card, and we could correlate this with redis logs.

kenneth-arista commented 1 year ago

As discussed offline, this issue can be closed. A separate PR will be raised to introduce additional timeout.