Open ggiamarchi opened 6 years ago
Hi - two points:
1) We did a bunch of work with ONL to make sure that a lot of the base line setup work was done for you, so check out http://opennetlinux.org/wedge .
2) A lot of the stuff is poorly documented and honestly more complicated then it needs to be, so a bunch of us are trying to both better document things and clean them us as we go. Ultimately this is a work in progress and I'll leave this issue open as we improve.
Do check out what comes with the wedge ONL build and let me know if there's more specific things I can describe.
The JSON configuration file is particularly not well documented: it contains some implicit dependencies and some of the fields are now vestigial. Was there a specific change you were trying to make? Perhaps I can help you with that while the larger refactoring/documentation is happening.
FYI: @aeckert because you had a recent diff to clean up some of the config terms.
Following on @ggiamarchi question, I am also in need for a bit more documentation on the JSON file. One particular need is to be able to configure the 40G ports of the Wedge 40 as either 40G or 4x10G. It seems out of the box everything is 4x10G. Strangely enough I see only 63 ports instead of the expected 64, that's another debate...
Hi both.
This is definitely a big issue with the current config. There is no good documentation for it as of this moment but there is an effort underway to clean up the config as well as better document it.
For context, we use the config internally as the output of another script, so there hasn't been any effort made to make it easy for humans to consume :-( I've intentionally left this issue open so that I can post documentation updates here as things get better.
That said, I can try to answer specific questions if you have them. Right now, you need to specify your break out cables (e.g., 4x10g or 1x40g) in the configuration at start time.
Also, FYI @sonoble
thanks @capveg for keeping the thread alive. I am ok so far editing the json file "manually".
I would need more details to actually understand what you mean by configuring the breakout cables at start time. I've tried unsuccessfully to set the speed to 40000 as I found in the thrift code switch_config.thrift. I've seen there is a concept of aggregate_ports in the code. Trying to understand code I don't understand (thrift and cpp...), I wonder if I should add a section a bit like the following in my config.json file. (the following doesn't work)
"aggregate_port": [
{
"key": 1,
"name": "port1",
"description": "description 1",
"memberPorts": [
{
"memberPortID": 53,
"priority": 1,
"rate": 1,
"activity": 1
},
{
"memberPortID": 54,
"priority": 1,
"rate": 1,
"activity": 1
},
{
"memberPortID": 55,
"priority": 1,
"rate": 1,
"activity": 1
},
{
"memberPortID": 56,
"priority": 1,
"rate": 1,
"activity": 1
}
]
}
],
And btw, do you also know why list_ports returns only 63 ports on a wedge40? Is that a bug? When I enable port 64 with fboss_route.py, it doesn't complain but still the port is not listed when calling list_ports afterwards.
Sorry - missed this comment.
Aggregate ports are different (I know.. it's confusing.. but that's networking) than breakout cables.
So, are you trying to get one 40G port to show up as 1x40G or 4x10G?
FYI: additional documentation and motivation can be found in our new Sigcomm paper - https://dl.acm.org/authorize?N666958
yes. trying to have the QSFP ports behave as either 4x10G (to the compute nodes) or 1x40G (to compute or spines)
So, honestly right now - this is a bit of a mess :-(
A few things that make this complicated.
1) there is a broadcom config, typically named config.bcm, which configures the chip for each port config or even dynamic ("flexports"). This config needs to be in sync with the agent config as the agent config doesn't actually control the mapping, but must be aware of it.
2) Depending on the version of opennsl you're using, the config.bcm is an explicit file (e.g., with newer versions) or implicitly built into the binary (with older versions, e.g,. 8e0b499f02dcef751a3703c9a18600901374b28a - which fboss uses by default). If it's implicit in the binary, you have to change it differently.
3) Every four logical ports can combine to be one 40g port, e.g., port 1 can either be a 10G port with ports 2-4 also being 10G) or ports 2-4 can effectively "go away" and then port 1 becomes 40G. The ports that can do this fixed, e.g., only ports where the number (mod 4) = 1 can be combined into a 40G port.
@mimizone : to your question about "why only 63 ports"; there are 16 front panel ports and each one can be broken out 4 ways, so we get 64. I'm assuming there's a zero indexed port so your '63' is really '64', but hopefully that helps. Now, a separate question is : if the chip is capable of 32x40G ports, why are there only 16 ports used on the front panel? Answer: the initial wedge40 was designed to replace a 16 port x 40G switch and this was (I'm told) the easiest way to create a drop in replacement.
@mimizone If you need help getting what @capveg said above running, have a look at https://github.com/dhtech/fboss/pull/4/files where I managed to pull in OpenNSL 3.5.0.1, and see this for how to run it.
I have not yet hit the opennsl_pkt_alloc
crash mentioned in getdeps.sh
but I guess I'll cross that bridge when I come to it.
Personally what we're looking at is running 1G SFP module in one of the port using a QSFP+->SFP+ adapter, so that's my current goal. But the 40G is of course also interesting. I raised a question https://github.com/Broadcom-Switch/OpenNSL/issues/37 and I've found some BCM configs laying around here and there for ideas, and also the official docs that are somewhat useful. I have not succeeded in setting anything viable however.
@bluecmd thanks for the link (and the pointers), that's incredibly helpful.
@capveg is this the right issue to be reporting issues related to the packages / binaries for fboss and the Wedge 100 on ONL, or should those be reported elsewhere? Since they're related to the hardware that seems built for FBOSS, seems like it could go either way.
Hate to revive a dead conversation, but... "implicitly built into the binary" ... "If it's implicit in the binary, you have to change it differently." Any suggestions on what that "change it differently" is? I've got an old Wedge-16x that I'm trying to resurrect - should I just load a new image on it, or can I easily configure it to run the ports in 40g mode? Thanks!
@pjd-nu Thanks for reviving this as a lot of the SDK internals have changed - particularly Broadcom has open sourced its SDK and so we've moved the fboss dependency from OpenNSL to directly depending on the open source'd SDK: https://github.com/Broadcom-Network-Switching-Software/OpenBCM .
Once you follow the new build instructions, then it should be just a matter of setting up an agent.conf file that configures the ports as 40G (as opposed to 4x10G ports in a breakout).
The format of the agent.conf file is specified in the switch_config.thrift file here: https://github.com/facebook/fboss/blob/master/fboss/agent/switch_config.thrift and is in JSON. Here are some example configs but they have not been kept up to date and may not work perfectly out of the box.
https://github.com/facebook/fboss/tree/master/fboss/agent/configs
Hope this helps.
One correction: FBOSS does not use OpenBCM. FBOSS uses OpenNSA: https://github.com/facebook/fboss/blob/master/build/fbcode_builder/manifests/OpenNSA#L5
@pjd-nu one way to approach this would be, to follow instructions here: https://github.com/facebook/fboss/blob/master/installer/centos-7-x86_64/README.md
and get to a point of being able to run Tests. Running FBOSS agent is superset of that, and bulk of the work towards running Agent would be done when it is possible to run the Tests.
[sorry for the delay - there was a big paper submission deadline last week...]
So I just want to double check something in the directions shri-khare pointed me to - I should just install a normal copy of Centos 7 on the box as if it were some random generic server? No Open Network Linux or anything?
I was a bit surprised because all the stuff I'd found so far talked about ONL as sort of a fundamental part, and I guess subconsciously I was thinking it was something more than yet another small linux distribution. And I guess it's just a server with a few PCIe devices that we want to control...
Open Network Linux (which I helped create) is super useful for switches that don't have a Board Management Controller (BMC). However, if you have a wedge40 or similar FB designed switch, it does have a BMC which runs OpenBMC with all of the environmentals running on it so the main OS doesn't need any of those extra drivers.
So short story - yes stock Centos will do.
(re: paper submission - OSDI?)
On Mon, Dec 14, 2020 at 7:04 PM Peter Desnoyers notifications@github.com wrote:
[sorry for the delay - there was a big paper submission deadline last week...]
So I just want to double check something in the directions shri-khare pointed me to - I should just install a normal copy of Centos 7 on the box as if it were some random generic server? No Open Network Linux or anything?
I was a bit surprised because all the stuff I'd found so far talked about ONL as sort of a fundamental part, and I guess subconsciously I was thinking it was something more than yet another small linux distribution. And I guess it's just a server with a few PCIe devices that we want to control...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/facebook/fboss/issues/61#issuecomment-745021269, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEK4ZM7MOUQUJCSIIUBGXD3SU3G2TANCNFSM4EEXEPNQ .
Yup, OSDI.
So now I'm in the middle of build hell, trying to figure out why folly can't find libatomic. I'll file a bug on installer/centos-7-x86_64/README.md - there are a couple of dependencies (including I think devtoolset-8-gcc, since folly requires gcc 5.0) that aren't mentioned.
Did you follow the README steps in that order? 1.4 install-tools installs the compiler and 1.5 sets the right path.
weird - not everything got installed the first time. (but it must have run through to the end, because devtoolset-8-libasan-devel was there...)
Now I'm getting: gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)
Thanks!
All set now. Thanks!
BTW, for some reason at the very beginning of the build script it consistently got a 403 trying to download openNSA from Broadcom, but it downloaded fine manually with curl and I copied it to the indicated name in the download directory.
Hi There, Can I find somewhere a reference documentation for FBOSS. I need to test FBOSS on Wedge 16x (40Gbps) switchs. Primarily, I'd love to find a reference documentation for
fboss_wedge_agent
configuration JSON input file.