aristanetworks / sonic

Open source drivers and initialization library for Arista platforms running SONiC
GNU General Public License v2.0
25 stars 30 forks source link

[chassis] syncd busy on all linecards, taking >100% CPU usage #81

Closed wenyiz2021 closed 1 year ago

wenyiz2021 commented 1 year ago

re-opening issue: https://github.com/aristanetworks/sonic/issues/59 as CPU usage is still > 100% on all linecards

repro steps: do top on any linecard that's not running any tests

rlhui commented 1 year ago

Hi, any update on this one? Can we include this in weekly discussion? Thanks.

wenyiz2021 commented 1 year ago

@arlakshm @kenneth-arista for viz

wenyiz2021 commented 1 year ago

this fails test_cpu_memory_usage

wenyiz2021 commented 1 year ago

when linecard is not running any tests, syncd process output from top cmd is ~40% for clearwater2 LC which is fine. While running test_cpu_memory_usage.py, syncd process is consistently > 50% threshold

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                                 
 730102 root      20   0 5047072   1.8g 183048 S  87.5  11.5   1070:51 syncd                                                                                                                                                                                   
 721852 root      20   0  112376  62352   7652 S  12.5   0.4 200:08.35 redis-server                                                                                                                                                                            
1423952 admin     20   0   11096   4128   3256 R   6.2   0.0   0:00.02 top                                                                                                                                                                                     
      1 root      20   0  168212  14892   8976 S   0.0   0.1   3:44.26 systemd 
07/03/2023 22:38:10 test_cpu_memory_usage.test_cpu_memory_us L0064 DEBUG  | ------ Iteration 23 ------
07/03/2023 22:38:10 test_cpu_memory_usage.check_cpu_usage    L0227 DEBUG  | process syncd(730102) cpu usage exceeds 50%.
07/03/2023 22:38:10 test_cpu_memory_usage.analyse_monitoring L0090 ERROR  | processes that persistently exceeds cpu usage 50%: [[u'/usr/bin/syncd', u'--diag', u'-u', u'-s', u'-p', u'/etc/sai.d/sai.profile', u'-b', u'/tmp/break_before_make_objects']]
wenyiz2021 commented 1 year ago

closing this issue, testcase needs update on cpu threshold, Nokia chassis also fail this test.