aristanetworks / sonic

Open source drivers and initialization library for Arista platforms running SONiC
GNU General Public License v2.0
25 stars 30 forks source link

[Chassis][202405] Random ports go down after reboot on Wolverine card #102

Closed arlakshm closed 5 days ago

arlakshm commented 2 weeks ago

Random ports do not come up after reboot on 202405 image on Wolverine linecard.

after boot-up with new image

admin@str2-7804-lc5-1:~$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED          STATUS          PORTS     NAMES
df15c0306e45   docker-sonic-telemetry:latest     "/usr/local/bin/supe…"   32 minutes ago   Up 10 minutes             telemetry
2d3c5b84099b   docker-snmp:latest                "/usr/local/bin/supe…"   32 minutes ago   Up 10 minutes             snmp
eea94d762e7c   docker-lldp:latest                "/usr/bin/docker-lld…"   32 minutes ago   Up 10 minutes             lldp1
ff7d10b1f32d   docker-lldp:latest                "/usr/bin/docker-lld…"   33 minutes ago   Up 10 minutes             lldp0
bbd4cb3858d5   docker-sonic-gnmi:latest          "/usr/local/bin/supe…"   33 minutes ago   Up 10 minutes             gnmi
8deb86338be3   docker-platform-monitor:latest    "/usr/bin/docker_ini…"   35 minutes ago   Up 13 minutes             pmon
e9df4797e8f6   64ba6175a968                      "/usr/local/bin/supe…"   36 minutes ago   Up 13 minutes             macsec1
f723cace274e   64ba6175a968                      "/usr/local/bin/supe…"   36 minutes ago   Up 13 minutes             macsec0
0d19f5170395   docker-router-advertiser:latest   "/usr/bin/docker-ini…"   36 minutes ago   Up 13 minutes             radv
2ba56a17f425   docker-syncd-brcm-dnx:latest      "/usr/local/bin/supe…"   37 minutes ago   Up 13 minutes             syncd1
0de2eb4846a4   docker-syncd-brcm-dnx:latest      "/usr/local/bin/supe…"   37 minutes ago   Up 13 minutes             syncd0
1953086af561   docker-fpm-frr:latest             "/usr/bin/docker_ini…"   37 minutes ago   Up 13 minutes             bgp1
50cd5e82ae7f   docker-teamd:latest               "/usr/local/bin/supe…"   37 minutes ago   Up 13 minutes             teamd0
05eccb6a9a8d   docker-fpm-frr:latest             "/usr/bin/docker_ini…"   37 minutes ago   Up 13 minutes             bgp0
a0aee7c22a7f   docker-teamd:latest               "/usr/local/bin/supe…"   37 minutes ago   Up 13 minutes             teamd1
c5ddf9eadc2f   docker-orchagent:latest           "/usr/bin/docker-ini…"   37 minutes ago   Up 13 minutes             swss0
eb1598095c3c   docker-orchagent:latest           "/usr/bin/docker-ini…"   37 minutes ago   Up 13 minutes             swss1
1f440092b7b0   docker-sonic-restapi:latest       "/usr/local/bin/supe…"   37 minutes ago   Up 14 minutes             restapi
66831ca4c775   docker-eventd:latest              "/usr/local/bin/supe…"   37 minutes ago   Up 14 minutes             eventd
9d54e12a608c   docker-acms:latest                "/usr/local/bin/supe…"   37 minutes ago   Up 14 minutes             acms
a61e183ea985   docker-database:latest            "/usr/local/bin/dock…"   37 minutes ago   Up 14 minutes             database1
76aefa9a1b36   docker-database:latest            "/usr/local/bin/dock…"   37 minutes ago   Up 14 minutes             database0
46f525dca856   docker-database:latest            "/usr/local/bin/dock…"   37 minutes ago   Up 14 minutes             database
admin@str2-7804-lc5-1:~$ ps aux | grep orch
root        5065  0.7  0.2 579004 36460 pts/0    Sl   19:53   0:06 /usr/bin/orchagent -d /var/log/swss -b 1024 -s -i 07:00.0 -f swss.asic1.rec -j sairedis.asic1.rec -m 2c:dd:e9:6c:cc:7d
root        6552  0.6  0.2 578648 36444 pts/0    Sl   19:53   0:05 /usr/bin/orchagent -d /var/log/swss -b 1024 -s -i 06:00.0 -f swss.asic0.rec -j sairedis.asic0.rec -m 2c:dd:e9:6c:cc:7d
admin      20225  0.0  0.0   6972  2084 pts/0    S+   20:07   0:00 grep orch

admin@str2-7804-lc5-1:~$ show int status
      Interface            Lanes    Speed    MTU    FEC         Alias             Vlan    Oper    Admin             Type    Asym PFC
---------------  ---------------  -------  -----  -----  ------------  ---------------  ------  -------  ---------------  ----------
      Ethernet0      72,73,74,75     100G   9100     rs   Ethernet1/1   PortChannel102    down       up  QSFP28 or later         off
      Ethernet8      80,81,82,83     100G   9100     rs   Ethernet2/1   PortChannel102    down       up  QSFP28 or later         off
     Ethernet16      88,89,90,91     100G   9100     rs   Ethernet3/1   PortChannel104    down       up  QSFP28 or later         off
     Ethernet24      96,97,98,99     100G   9100     rs   Ethernet4/1   PortChannel104    down       up  QSFP28 or later         off
     Ethernet32  104,105,106,107     100G   9100     rs   Ethernet5/1   PortChannel106    down       up  QSFP28 or later         off
     Ethernet40  112,113,114,115     100G   9100     rs   Ethernet6/1   PortChannel106    down       up  QSFP28 or later         off
     Ethernet48  120,121,122,123     100G   9100     rs   Ethernet7/1   PortChannel108    down       up  QSFP28 or later         off
     Ethernet56  128,129,130,131     100G   9100     rs   Ethernet8/1   PortChannel108    down       up  QSFP28 or later         off
     Ethernet64  136,137,138,139     100G   9100     rs   Ethernet9/1  PortChannel1010    down       up  QSFP28 or later         off
     Ethernet72      64,65,66,67     100G   9100     rs  Ethernet10/1  PortChannel1010    down       up  QSFP28 or later         off
     Ethernet80      56,57,58,59     100G   9100     rs  Ethernet11/1  PortChannel1012    down       up  QSFP28 or later         off
     Ethernet88      48,49,50,51     100G   9100     rs  Ethernet12/1  PortChannel1012    down       up  QSFP28 or later         off
     Ethernet96      40,41,42,43     100G   9100     rs  Ethernet13/1           routed    down       up  QSFP28 or later         off
    Ethernet104      32,33,34,35     100G   9100     rs  Ethernet14/1  PortChannel1016    down       up  QSFP28 or later         off
    Ethernet112      24,25,26,27     100G   9100     rs  Ethernet15/1  PortChannel1016    down       up  QSFP28 or later         off
    Ethernet120      16,17,18,19     100G   9100     rs  Ethernet16/1           routed    down       up  QSFP28 or later         off
    Ethernet128        8,9,10,11     100G   9100     rs  Ethernet17/1  PortChannel1020    down       up  QSFP28 or later         off
    Ethernet136          0,1,2,3     100G   9100     rs  Ethernet18/1  PortChannel1020    down       up  QSFP28 or later         off
    Ethernet144      72,73,74,75     100G   9100     rs  Ethernet19/1           routed      up       up  QSFP28 or later         off
    Ethernet152      80,81,82,83     100G   9100     rs  Ethernet20/1           routed      up       up  QSFP28 or later         off
    Ethernet160      88,89,90,91     100G   9100     rs  Ethernet21/1           routed      up       up  QSFP28 or later         off
    Ethernet168      96,97,98,99     100G   9100     rs  Ethernet22/1           routed      up       up  QSFP28 or later         off
    Ethernet176  104,105,106,107     100G   9100     rs  Ethernet23/1           routed      up       up  QSFP28 or later         off
    Ethernet184  112,113,114,115     100G   9100     rs  Ethernet24/1           routed      up       up  QSFP28 or later         off
    Ethernet192  120,121,122,123     100G   9100     rs  Ethernet25/1           routed      up       up  QSFP28 or later         off
    Ethernet200  128,129,130,131     100G   9100     rs  Ethernet26/1           routed      up       up  QSFP28 or later         off
    Ethernet208  136,137,138,139     100G   9100     rs  Ethernet27/1           routed      up       up  QSFP28 or later         off
    Ethernet216      64,65,66,67     100G   9100     rs  Ethernet28/1           routed      up       up  QSFP28 or later         off
    Ethernet224      56,57,58,59     100G   9100     rs  Ethernet29/1           routed      up       up  QSFP28 or later         off
    Ethernet232      48,49,50,51     100G   9100     rs  Ethernet30/1           routed      up       up  QSFP28 or later         off
    Ethernet240      40,41,42,43     100G   9100     rs  Ethernet31/1           routed      up       up  QSFP28 or later         off
    Ethernet248      32,33,34,35     100G   9100     rs  Ethernet32/1           routed    down       up  QSFP28 or later         off
    Ethernet256      24,25,26,27     100G   9100     rs  Ethernet33/1           routed    down     down  QSFP28 or later         off
    Ethernet264      16,17,18,19     100G   9100     rs  Ethernet34/1           routed    down     down  QSFP28 or later         off
    Ethernet272        8,9,10,11     100G   9100     rs  Ethernet35/1           routed    down     down  QSFP28 or later         off
    Ethernet280          0,1,2,3     100G   9100     rs  Ethernet36/1           routed    down     down  QSFP28 or later         off
 PortChannel102              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
 PortChannel104              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
 PortChannel106              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
 PortChannel108              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
PortChannel1010              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
PortChannel1012              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
PortChannel1016              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
PortChannel1020              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A

After reboot

admin@str2-7804-lc5-1:~$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED       STATUS             PORTS     NAMES
df15c0306e45   docker-sonic-telemetry:latest     "/usr/local/bin/supe…"   2 hours ago   Up About an hour             telemetry
2d3c5b84099b   docker-snmp:latest                "/usr/local/bin/supe…"   2 hours ago   Up About an hour             snmp
eea94d762e7c   docker-lldp:latest                "/usr/bin/docker-lld…"   2 hours ago   Up About an hour             lldp1
ff7d10b1f32d   docker-lldp:latest                "/usr/bin/docker-lld…"   2 hours ago   Up About an hour             lldp0
bbd4cb3858d5   docker-sonic-gnmi:latest          "/usr/local/bin/supe…"   2 hours ago   Up About an hour             gnmi
8deb86338be3   docker-platform-monitor:latest    "/usr/bin/docker_ini…"   2 hours ago   Up About an hour             pmon
e9df4797e8f6   64ba6175a968                      "/usr/local/bin/supe…"   2 hours ago   Up About an hour             macsec1
f723cace274e   64ba6175a968                      "/usr/local/bin/supe…"   2 hours ago   Up About an hour             macsec0
0d19f5170395   docker-router-advertiser:latest   "/usr/bin/docker-ini…"   2 hours ago   Up About an hour             radv
2ba56a17f425   docker-syncd-brcm-dnx:latest      "/usr/local/bin/supe…"   2 hours ago   Up About an hour             syncd1
0de2eb4846a4   docker-syncd-brcm-dnx:latest      "/usr/local/bin/supe…"   2 hours ago   Up About an hour             syncd0
1953086af561   docker-fpm-frr:latest             "/usr/bin/docker_ini…"   2 hours ago   Up About an hour             bgp1
50cd5e82ae7f   docker-teamd:latest               "/usr/local/bin/supe…"   2 hours ago   Up About an hour             teamd0
05eccb6a9a8d   docker-fpm-frr:latest             "/usr/bin/docker_ini…"   2 hours ago   Up About an hour             bgp0
a0aee7c22a7f   docker-teamd:latest               "/usr/local/bin/supe…"   2 hours ago   Up About an hour             teamd1
c5ddf9eadc2f   docker-orchagent:latest           "/usr/bin/docker-ini…"   2 hours ago   Up About an hour             swss0
eb1598095c3c   docker-orchagent:latest           "/usr/bin/docker-ini…"   2 hours ago   Up About an hour             swss1
1f440092b7b0   docker-sonic-restapi:latest       "/usr/local/bin/supe…"   2 hours ago   Up About an hour             restapi
66831ca4c775   docker-eventd:latest              "/usr/local/bin/supe…"   2 hours ago   Up About an hour             eventd
9d54e12a608c   docker-acms:latest                "/usr/local/bin/supe…"   2 hours ago   Up About an hour             acms
a61e183ea985   docker-database:latest            "/usr/local/bin/dock…"   2 hours ago   Up About an hour             database1
76aefa9a1b36   docker-database:latest            "/usr/local/bin/dock…"   2 hours ago   Up About an hour             database0
46f525dca856   docker-database:latest            "/usr/local/bin/dock…"   2 hours ago   Up About an hour             database
admin@str2-7804-lc5-1:~$ ps aux | grep orch
root        6643  0.5  0.2 578916 38668 pts/0    Sl   20:14   0:26 /usr/bin/orchagent -d /var/log/swss -b 1024 -s -i 06:00.0 -f swss.asic0.rec -j sairedis.asic0.rec -m 2c:dd:e9:6c:cc:7d
root       23932  0.5  0.2 578912 36560 pts/0    Sl   20:16   0:26 /usr/bin/orchagent -d /var/log/swss -b 1024 -s -i 07:00.0 -f swss.asic1.rec -j sairedis.asic1.rec -m 2c:dd:e9:6c:cc:7d
admin      85870  0.0  0.0   6972  2040 pts/0    S+   21:31   0:00 grep orch
admin@str2-7804-lc5-1:~

admin@str2-7804-lc5-1:~$ show int status
      Interface            Lanes    Speed    MTU    FEC         Alias             Vlan    Oper    Admin             Type    Asym PFC
---------------  ---------------  -------  -----  -----  ------------  ---------------  ------  -------  ---------------  ----------
      Ethernet0      72,73,74,75     100G   9100     rs   Ethernet1/1   PortChannel102      up       up  QSFP28 or later         off
      Ethernet8      80,81,82,83     100G   9100     rs   Ethernet2/1   PortChannel102      up       up  QSFP28 or later         off
     Ethernet16      88,89,90,91     100G   9100     rs   Ethernet3/1   PortChannel104      up       up  QSFP28 or later         off
     Ethernet24      96,97,98,99     100G   9100     rs   Ethernet4/1   PortChannel104      up       up  QSFP28 or later         off
     Ethernet32  104,105,106,107     100G   9100     rs   Ethernet5/1   PortChannel106      up       up  QSFP28 or later         off
     Ethernet40  112,113,114,115     100G   9100     rs   Ethernet6/1   PortChannel106      up       up  QSFP28 or later         off
     Ethernet48  120,121,122,123     100G   9100     rs   Ethernet7/1   PortChannel108      up       up  QSFP28 or later         off
     Ethernet56  128,129,130,131     100G   9100     rs   Ethernet8/1   PortChannel108      up       up  QSFP28 or later         off
     Ethernet64  136,137,138,139     100G   9100     rs   Ethernet9/1  PortChannel1010      up       up  QSFP28 or later         off
     Ethernet72      64,65,66,67     100G   9100     rs  Ethernet10/1  PortChannel1010      up       up  QSFP28 or later         off
     Ethernet80      56,57,58,59     100G   9100     rs  Ethernet11/1  PortChannel1012      up       up  QSFP28 or later         off
     Ethernet88      48,49,50,51     100G   9100     rs  Ethernet12/1  PortChannel1012      up       up  QSFP28 or later         off
     Ethernet96      40,41,42,43     100G   9100     rs  Ethernet13/1           routed      up       up  QSFP28 or later         off
    Ethernet104      32,33,34,35     100G   9100     rs  Ethernet14/1  PortChannel1016      up       up  QSFP28 or later         off
    Ethernet112      24,25,26,27     100G   9100     rs  Ethernet15/1  PortChannel1016      up       up  QSFP28 or later         off
    Ethernet120      16,17,18,19     100G   9100     rs  Ethernet16/1           routed      up       up  QSFP28 or later         off
    Ethernet128        8,9,10,11     100G   9100     rs  Ethernet17/1  PortChannel1020      up       up  QSFP28 or later         off
    Ethernet136          0,1,2,3     100G   9100     rs  Ethernet18/1  PortChannel1020      up       up  QSFP28 or later         off
    Ethernet144      72,73,74,75     100G   9100     rs  Ethernet19/1           routed      up       up  QSFP28 or later         off
    Ethernet152      80,81,82,83     100G   9100     rs  Ethernet20/1           routed      up       up  QSFP28 or later         off
    Ethernet160      88,89,90,91     100G   9100     rs  Ethernet21/1           routed      up       up  QSFP28 or later         off
    Ethernet168      96,97,98,99     100G   9100     rs  Ethernet22/1           routed      up       up  QSFP28 or later         off
    Ethernet176  104,105,106,107     100G   9100     rs  Ethernet23/1           routed    down       up  QSFP28 or later         off
    Ethernet184  112,113,114,115     100G   9100     rs  Ethernet24/1           routed      up       up  QSFP28 or later         off
    Ethernet192  120,121,122,123     100G   9100     rs  Ethernet25/1           routed      up       up  QSFP28 or later         off
    Ethernet200  128,129,130,131     100G   9100     rs  Ethernet26/1           routed    down       up  QSFP28 or later         off
    Ethernet208  136,137,138,139     100G   9100     rs  Ethernet27/1           routed      up       up  QSFP28 or later         off
    Ethernet216      64,65,66,67     100G   9100     rs  Ethernet28/1           routed    down       up  QSFP28 or later         off
    Ethernet224      56,57,58,59     100G   9100     rs  Ethernet29/1           routed    down       up  QSFP28 or later         off
    Ethernet232      48,49,50,51     100G   9100     rs  Ethernet30/1           routed    down       up  QSFP28 or later         off
    Ethernet240      40,41,42,43     100G   9100     rs  Ethernet31/1           routed      up       up  QSFP28 or later         off
    Ethernet248      32,33,34,35     100G   9100     rs  Ethernet32/1           routed    down       up  QSFP28 or later         off
    Ethernet256      24,25,26,27     100G   9100     rs  Ethernet33/1           routed    down     down  QSFP28 or later         off
    Ethernet264      16,17,18,19     100G   9100     rs  Ethernet34/1           routed    down     down  QSFP28 or later         off
    Ethernet272        8,9,10,11     100G   9100     rs  Ethernet35/1           routed    down     down  QSFP28 or later         off
    Ethernet280          0,1,2,3     100G   9100     rs  Ethernet36/1           routed    down     down  QSFP28 or later         off
 PortChannel102              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
 PortChannel104              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
 PortChannel106              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
 PortChannel108              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
PortChannel1010              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
PortChannel1012              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
PortChannel1016              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
PortChannel1020              N/A     200G   9100    N/A           N/A           routed    down       up              N/A         N/A
admin@str2-7804-lc5-1:~$
arlakshm commented 2 weeks ago

using the latest 202405 image

admin@str2-7804-lc5-1:~$ show vers

SONiC Software Version: SONiC.internal-202405.102772879-4017d43a71
SONiC OS Version: 12
Distribution: Debian 12.5
Kernel: 6.1.0-11-2-amd64
Build commit: 4017d43a71
Build date: Tue Sep 10 05:20:55 UTC 2024
Built by: azureuser@c00dd463c000000

Platform: x86_64-arista_7800r3a_36dm2_lc
HwSKU: Arista-7800R3A-36DM2-C36
ASIC: broadcom
ASIC Count: 2
Serial Number: SGD21190878
Model Number: 7800R3A-36DM2-LC
Hardware Revision: 2a.00
Uptime: 21:32:09 up  1:19,  1 user,  load average: 1.50, 1.63, 1.63
Date: Tue 10 Sep 2024 21:32:09
arlakshm commented 2 weeks ago

@kenneth-arista for viz

arista-nwolfe commented 2 weeks ago

This appears to be the same as https://github.com/sonic-net/sonic-buildimage/issues/19892 Basically because the media_settings.json file is corrupted (contains negative values) for Wolverine SKU there is a race between PMON and portsorch for adding ports into ASIC_DB. With PMON no longer being startup delayed on SpineRouters (https://github.com/sonic-net/sonic-buildimage/pull/19657) PMON can add the corrupted media_settings.json values into APPL_DB before portsorch has a chance to process a port. In that case the port will never come up because it'll always be short circuited by this error: parsePortSerdes: Failed to parse field(pre1): Invalid argument: '-0x6'

arlakshm commented 2 weeks ago

Thanks @arista-nwolfe for the triaging this issue. Is there PR with the new media settings?

kenneth-arista commented 5 days ago

https://github.com/sonic-net/sonic-buildimage/pull/20308

kenneth-arista commented 5 days ago

The PR with the fix merged to master. The PR to 202405 still has to merge. Closing this issue.