Closed jimthedj65 closed 1 year ago
This is an odd error message to see.
It's a message that usually happens when trying to initialize the drivers twice in a row without deinitializing them in between.
However during normal operations this is not supposed to be happening.
Did you try to see if a cold reboot solves the issue?
I currently don't have enough logs to properly understand what might be causing this. I would need the full syslog
since the boot and the associated arista.log
.
Hi Staphylo,
The cold boot helped and the pcieinfo shows properly, what's the best way to reinitialise the arista setup to see if that still is an issue.
From what I can see, the Tofino/Barefoot platform stopped updating about 10 months ago and I am thinking the latest master may be an issue. should I cycle back to 202205 or 202111?
The drivers may have got initialised twice by mistake, that's a good call.
I can send you the logs they are very large over 130MB, let me know where to send them.
Further to add when I do show ip int the first time it shows all the devices, but second time it only shows the management interface?
@jimthedj65 to reinitialize the platform drivers you'll need to run systemctl restart platform-arista.target
.
However be aware that this command might reset your dataplane.
There is some manual steps that could prevent this, but it is subject to change so I prefer if folks do not have expectations there.
Depending on your needs I would advocate on using release branches as they tend to be more stable than master.
202205
and 202211
would be good choices. I would probably recommend looking at the branch that receives the most attention based on commit frequency and use that one.
What you're describing is usually a symptom of swss/syncd crash. It's going to initialize, create the host interfaces for the ASIC, do some stuff, crash, remove the host interfaces. You should check for worrying messages tied to syncd in your syslog (segfault, ERR messages, ...).
Thanks, that's helpful. dmesg on aboot doesn't throw any errors, so it must be a software issue, I will load 202205 and see where we go my feeling is that barefoot stopped 10 months ago and it may well be 202111 that is stable.
when installing new images I see this below, can that be ignored or is there a setting somewhere to allow TPM boot from flash?
Error: TPM_BAD_PARAMETER TPM: failed to deassert physical presence Error: TPM_BAD_PARAMETER TPM: failed to lock physical presence command unzip: short read
Aboot has not much to do with SONiC, it's just a combo BIOS/bootloader.
Trying a stable branch is a good option. FYI, our internal testing is however doing well on master from a few weeks ago.
On this platform the TPM messages are harmless, you can safely ignore them. The TPM mostly matters for secureboot which this product does not support.
Thanks, Staphylo that's great; once I get these running I can start my encryption acceleration dev. Thanks for all your help.
as a sanity checker, I am installing in aboot doing the following Aboot:~# cd /mnt/flash Aboot:~# cp -a ./* /mnt/usb1/ Aboot:~# cp /mnt/usb1/sonic-aboot-barefoot.swi .
Aboot:~# boot /mnt/flash/sonic-aboot-barefoot.swi
Am I missing any steps? It should be this straight forward and creates the boot-config for the image once it loads; right?
Your solution works but you have to leave /mnt/flash
before running boot
.
Otherwise the boot process will not be able to umount /mnt/flash
and exit early.
The way I usually do it when I have to is
cd /mnt/flash
wget http://some.url/to/sonic-aboot-barefoot.swi
echo SWI=flash:sonic-aboot-barefoot.swi > boot-config
sync
reboot
The reason I prefer reboot
over just boot
is that it doesn't work on secureboot platforms so I prefer only having one workflow.
Thanks I will document your process it seems more robust. This is why my earlier post. and first attempt was not saving the config, so it all makes sense now, I was using boot only.
Just a quick note, if you have a dhcp server you can run udhcpc
to get a management IP address in Aboot.
Otherwise you'll need to configure your network statically via the console.
oh that's really helpful and then I could just scp it over right? I was about to come over and ask if that was possible.
I appear to get the interfaces below and tried udhcpc -q -i ma1 with no joy from my dnsmasq server, any settings I need to set in dnsmasq?
1: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ma2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
link/ether 0e:c9:53:be:84:d9 brd ff:ff:ff:ff:ff:ff
3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether fc:bd:67:62:bc:d0 brd ff:ff:ff:ff:ff:ff
inet6 fe80::febd:67ff:fe62:bcd0/64 scope link
valid_lft forever preferred_lft forever```
My switch seems to corrupt the image size when inserted into the usb slot. I have tried umounting and mounting. It constanlty shows up as HPFS/NTFS does it have to be formatted with that type, I have been formatting with fat?
Yes you should be able to ssh/scp as a client from Aboot. There's no server however.
I've only ever run udhcpc
without further parameters.
I'm not sure if there's any specific setup required, I would have expected things to just work with basic dhcp options (ip, mask, gw, nameservers)
If you have networking devices in between your management port and your server you'll need to setup your acls/dhcp_relay properly
ok let me try the relay option
I believe vfat or ext2-3 should be fine. ext4 might be a bit problematic if your mkfs.ext4 uses recent options that might be unsupported by Aboot. But technically SONiC uses the flash as ext4 so that should be fine.
Hi Staphylo, I configured KexAlgorithms diffie-hellman-group1-sha1 for my ssh server side and added an ip with ifconig on ma1 to the switch and boom I was able to scp the file over and it loaded 202205 great no more USBs lol
thanks for your help.
admin@sonic:~$ show platform pcieinfo ==============================Display PCIe Device=============================== bus:dev.fn 00:1f.6 - dev_id=0x8c24, Intel Corporation 8 Series Chipset Family Thermal Management Controller 8086:8c24 bus:dev.fn 00:1c.0 - dev_id=0x8c10, Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 8086:8c10 bus:dev.fn 06:00.0 - dev_id=0x0001, Arista Networks, Inc. (Device) [3475:0001] bus:dev.fn 00:1c.4 - dev_id=0x8c18, Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 8086:8c18 bus:dev.fn 07:00.0 - dev_id=0x0010, (Vendor) (Device) [1d1c:0010] bus:dev.fn ff:0b.3 - dev_id=0x0001, Arista Networks, Inc. (Device) [3475:0001]
success a working switch
loaded 202205 ran sudo config hostname myhostname and then ran sudo config interface ip add eth0 192.168.1.2/24 192.168.1.254 than ran the sudo aristasetup which ran fine, complained about my psu not being available and then ran sudo config save and rebooted
After reboot I ran sudo show int status
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
----------- ------- ------- ----- ----- ------- ------ ------ ------- ------ ----------
now my interfaces don't show and the error on pcieinfo is back, looks like I need to load back 202111?
show platform pcieinfo
Traceback (most recent call last):
File "/usr/local/bin/pcieutil", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1134, in invoke
Command.invoke(self, ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/pcieutil/main.py", line 78, in cli
load_platform_pcieutil()
File "/usr/local/lib/python3.9/dist-packages/pcieutil/main.py", line 52, in load_platform_pcieutil
platform_path, _ = device_info.get_paths_to_platform_and_hwsku_dirs()
File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 266, in get_paths_to_platform_and_hwsku_dirs
hwsku_path = os.path.join(platform_path, hwsku)
File "/usr/lib/python3.9/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/lib/python3.9/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'
Before I ran Arista setup, the pcieinfo was fine. My config_db.json has changed after the setup , and all ports have been deleted.
ok this is down to me not understanding when to hard code the config and when to use the config cli command
Just to be sure, you should never use the arista
binary under regular operations.
This is mostly a tool for us to develop and debug during escalation.
Our platform is iniatialized automatically at boot up via systemd service platform-arista*
and all platform information reported via the sonic platform API.
The first time you boot SONiC it enters a particular mode based on /host/image-XXX/platform/firsttime
This mode will generate a default configuration for your product and then delete that aforementioned file.
However I don't believe SONiC saves the configuration so you need to run config save
, so that's probably your issue.
Also you can provide your own configuration the first time you boot by putting either minigraph.xml
or config_db.json
on your flash.
Regarding the eth0
management ip, I'm not sure if you can use this CLI to configure it. (Maybe you can)
If you want to configure a static mgmt IP you need to edit MGMT_INTERFACE
and MGMT_PORT
table in config_db.json
.
You can also do this dynamically by using sonic-db-cli CONFIG_DB hset 'MGMT_INTERFACE|eth0|<ip>' gwaddr <gwip> forced_mgmt_routes@ "<subnet1>,<subnet2>,..."
Hi Staphylo, thanks for the updates I got all that done, and now I have two stable switches one on 202205 and one of a recent master. do you have any guidelines on breaking out a port on the Arista? I have two ports 100GBE to 2 x 50GBE
I tried sudo config interface breakout Ethernet8 '2x50G'
Do you want to Breakout the port, continue? [y/N]: y [ERROR] Breakout feature is not available without platform.json file Aborted!
I have confirmed that there is a platform.json file in my device directory.
thanks for all your help
Also for brevity I did exactly this and it worked, the cli sometimes stalls at times after I issued this command.
sonic-db-cli CONFIG_DB hset "PORT|Ethernet256" alias Ethernet65/1 fec none index 65 lanes 256 mtu 9100 pfc_asym off speed 10000 admin_status up
sonic-db-cli CONFIG_DB hset "PORT|Ethernet257" alias Ethernet66/1 fec none index 66 lanes 257 mtu 9100 pfc_asym off speed 10000 admin_status up
Regarding the
eth0
management ip, I'm not sure if you can use this CLI to configure it. (Maybe you can) If you want to configure a static mgmt IP you need to editMGMT_INTERFACE
andMGMT_PORT
table inconfig_db.json
. You can also do this dynamically by usingsonic-db-cli CONFIG_DB hset 'MGMT_INTERFACE|eth0|<ip>' gwaddr <gwip> forced_mgmt_routes@ "<subnet1>,<subnet2>,..."
I ran sudo config interface ip add eth0 192.168.1.2/24 192.168.1.254 and it allowed me local access via ssh
We did not implement DPB (Dynamic Port Breakout) for this product so you won't be able to use the helpers for that purpose. It's not in our plans to invest time doing this. It essentially requires 2 things:
platform.json
should have the interfaces
section filled in with the list of possible breakoutshwsku.json
configuration file with the default breakouts.config interface breakout
will read platform.json
to check the breakout parameter and will update the BREAKOUT_CFG
table in CONFIG_DB
.
However on barefoot duts, you should be able to perform the breakouts you want manually by populating the correct information in the PORT
table of CONFIG_DB
and reloading your configuration.
To convert a 1x100G to 2x50G you need to split the lanes and keep the index value:
Ethernet0:
lanes: 1,2,3,4
index: 1
alias: Ethernet1/1
speed: 100000
would become
Ethernet0:
lanes: 1,2
index: 1
alias: Ethernet1/1
speed: 50000
Ethernet2:
lanes: 3,4
index: 1
alias: Ethernet1/3
speed: 50000
Glad to know that config interface ip add eth0
works, thanks for sharing.
Thanks for your help. That's great. I was working through this.
/usr/share/sonic/device/x86_64-arista_7170_64c
"Ethernet8": {
"index": "1,1,1,1",
"lanes": "8,9,10,11",
"breakout_modes": {
"1x100G[40G]": ["Eth1"],
"2x50G": ["Eth1/1", "Eth1/2"],
"4x25G[10G]": ["Eth1/1", "Eth1/2", "Eth1/3", "Eth1/4"],
"2x25G(2)+1x50G(2)": ["Eth1/1", "Eth1/2", "Eth1/3"],
"1x50G(2)+2x25G(2)": ["Eth1/1", "Eth1/2", "Eth1/3]"
}
}
"Ethernet28": {
"index": "1,1,1,1",
"lanes": "28,29,30,31",
"breakout_modes": {
"1x100G[40G]": ["Eth1"],
"2x50G": ["Eth1/1", "Eth1/2"],
"4x25G[10G]": ["Eth1/1", "Eth1/2", "Eth1/3", "Eth1/4"],
"2x25G(2)+1x50G(2)": ["Eth1/1", "Eth1/2", "Eth1/3"],
"1x50G(2)+2x25G(2)": ["Eth1/1", "Eth1/2", "Eth1/3]"
}
}```
We did not implement DPB (Dynamic Port Breakout) for this product so you won't be able to use the helpers for that purpose. It's not in our plans to invest time doing this. It essentially requires 2 things:
platform.json
should have theinterfaces
section filled in with the list of possible breakouts- The HwSku folder should have a
hwsku.json
configuration file with the default breakouts.
config interface breakout
will readplatform.json
to check the breakout parameter and will update theBREAKOUT_CFG
table inCONFIG_DB
. However on barefoot duts, you should be able to perform the breakouts you want manually by populating the correct information in thePORT
table ofCONFIG_DB
and reloading your configuration.To convert a 1x100G to 2x50G you need to split the lanes and keep the index value:
Ethernet0: lanes: 1,2,3,4 index: 1 alias: Ethernet1/1 speed: 100000
would become
Ethernet0: lanes: 1,2 index: 1 alias: Ethernet1/1 speed: 50000 Ethernet2: lanes: 3,4 index: 1 alias: Ethernet1/3 speed: 50000
Glad to know that
config interface ip add eth0
works, thanks for sharing.
Thank you for all the responses; I had the 7170 looped to my layer 2 unifi switch, so it allowed me local access at least. I will post my port breakout updates and good to know barefoot seems simpler.
Hi Staphylos, I am getting a bit lost on the alias
"Ethernet8": {
"admin_status": "up",
"alias": "Ethernet3/1",
"index": "3",
"lanes": "8,9",
"mtu": "9100",
"speed": "50000"
},
"Ethernet9": {
"admin_status": "up",
"alias": "Ethernet3/2",
"index": "3",
"lanes": "10,11",
"mtu": "9100",
"speed": "50000",
},```
Do I have that correct, or is it sequential after the lanes like below?
```},
"Ethernet8": {
"admin_status": "up",
"alias": "Ethernet3/1",
"index": "3",
"lanes": "8,9",
"mtu": "9100",
"speed": "50000"
},
"Ethernet9": {
"admin_status": "up",
"alias": "Ethernet3/10",
"index": "3",
"lanes": "10,11",
"mtu": "9100",
"speed": "50000",
},```
So the alias column used to be the vendor specific interface name. I believe it was intended at a smoother transition from the proprietary vendor NOS to SONiC (in our case EOS) It's still mostly true however the host interface renaming work that is happening is using the alias column to help with the transition.
So you could pretty much do what you want in the alias column as long as it doesn't conflict with host interface. The EOS naming convention is:
EthernetX
for ports with only 1 lane (e.g SFP, RJ45)EthernetX/Y
for ports with multiple lanes (e.g QSFP, OSFP)
On these example X is the front panel port index starting from 1 (so pretty much the value of the index column)
And Y is the lane of this port. So for a 2x50G breakout on a QSFP you use 2 lanes per 50G and you end up with EthernetX/1
and EthernetX/3
. On a QSFP-DD in 2x200G you would end up with EthernetX/1
and EthernetX/5
.In your example you want Ethernet3/1
and Ethernet3/2
.
And for the host interfaces you want to use Ethernet8
and Ethernet10
because if you want to now breakout in 4x25 it'll be a problem because Ethernet9 is already allocated.
Thanks for the sanity check, all makes sense now.
I updated the config_db.json to break out the ports with 50G x 2 on port 3 in the ports section of config_db.json with below
"Ethernet4": {
"admin_status": "up",
"alias": "Ethernet2/1",
"index": "2",
"lanes": "4,5,6,7",
"mtu": "9100",
"speed": "100000"
},
"Ethernet8": {
"admin_status": "up",
"alias": "Ethernet3/1",
"index": "3",
"lanes": "8,9",
"mtu": "9100",
"speed": "50000"
},
"Ethernet10": {
"admin_status": "up",
"alias": "Ethernet3/2",
"index": "3",
"lanes": "10,11",
"mtu": "9100",
"speed": "50000",
},
"Ethernet12": {
"admin_status": "up",
"alias": "Ethernet4/1",
"index": "4",
"lanes": "12,13,14,15",
"mtu": "9100",
"speed": "100000"
And added the necessary port commands for their breakout, and whilst the config loads, the interfaces don't come up.
sudo vi /etc/sonic/config_db.json sync sudo reboot
Am I missing a step or do you think the platform.json and HWSKU files need to be populated on the 7170?
Hi All,
When I run the setup for the 7170-64c platform, we have the following errors.
Platform Info Platform: x86_64-arista_7170_64c HwSKU: Arista-7170-64C ASIC: barefoot ASIC Count: 1 Serial Number: SGD19451927 Model Number: DCS-7170-64C Hardware Revision: 02.00
ASY: ASY0279504A0 HwApi: 02.00 HwRev: 11.01 KVN: 111 MAC: fc:bd:67:62:bc:d0 MfgTime: 20191204003043 PCA: PCA0100004A0 SID: Alhambra SKU: DCS-7170-64C SerialNumber: SGD19451927
show platform barefoot profile Error response from daemon: Container 235f5ca1f92e31e92513f8930de7e9d163a3e602ee2acb7c4c9ab2438764ccc4 is not running Current profile: default
show platform pcieinfo ==============================Display PCIe Device=============================== bus:dev.fn 00:1f.6 - dev_id=0x8c24, Intel Corporation 8 Series Chipset Family Thermal Management Controller 8086:8c24 bus:dev.fn 00:1c.0 - dev_id=0x8c10, Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 8086:8c10 bus:dev.fn 06:00.0 - dev_id=0x0001, Arista Networks, Inc. (Device) [3475:0001] bus:dev.fn 00:1c.4 - dev_id=0x8c18, Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 8086:8c18 bus:dev.fn 07:00.0 - dev_id=0x0010, (Vendor) (Device) [1d1c:0010] bus:dev.fn ff:0b.3 - dev_id=0x0001, Arista Networks, Inc. (Device) [3475:0001]