facebook / fboss

Facebook Open Switching System Software for controlling network switches.
Other
860 stars 295 forks source link

issues on porting FBOSS to an OpenNSL supported platform #60

Open Lewis-Kang opened 6 years ago

Lewis-Kang commented 6 years ago

Hi,

I try to run agent on an AS6812-32X switch which is running ONL (Open Network Linux).

Firstly, I link fboss_agent with the opennsl.so that works for AS6812-32X and build wedge_agent executable.

Secondly, I install all wedge_agent needed libraries (such as folly, glog, gflags,...) onto the switch.

Thirdly, I configure the switch to have more than 64 ports during opennsl driver initialization so as to use the /etc/fboss/sample1.json (got from fboss/agent/configs/sample1.json) configuration directly.

Then I run wedge_agent -mgmt_if=ma1 -can_warm_boot=false -mode=wedge -config=/etc/fboss/sample1.json, the console prints out the following two kinds of errors:

  1. E0101 01:05:35.657263 2394 WedgeProductInfo.cpp:131] json parse error on line 0: expected json value E0101 01:05:35.658212 2394 WedgeProductInfo.cpp:66] json parse error on line 0: expected json value

  2. E0101 02:11:02.600317 4870 WedgePort.cpp:104] Error retrieving info for transceiver 0 Exception: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused E0101 02:11:02.618465 4870 WedgePort.cpp:104] Error retrieving info for transceiver 0 Exception: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused


Could someone give me a guidance on fixing the above two errors?

By the way, is there a porting guide for us to follow for porting FBOSS to a new platform?

Thanks in advance.

-Lewis

Lewis-Kang commented 6 years ago

Some more information (with the above two mentioned errors remained):

By using fboss_route.py list_ports, it can show correct port number, link status and admin enabled status. But present is wrong (I guess this requires extra porting for platform specific i2c slave address and offset)

e.g., root@localhost:~# fboss_route.py list_ports Port 1: [enabled=True, up=True, present=None] Port 2: [enabled=True, up=False, present=None] Port 3: [enabled=True, up=False, present=None] ...

root@localhost:~# fboss_route.py disable_port 1

root@localhost:~# fboss_route.py list_ports Port 1: [enabled=False, up=False, present=None]


And, it has issues on adding route entries.

e.g., root@localhost:~# fboss_route.py list_routes Route 0.0.0.0/0 --> Route ::/0 -->


root@localhost:~# fboss_route.py add 172.31.0.0/24 172.16.1.1 Traceback (most recent call last): File "/usr/bin/fboss_route.py", line 254, in args.func(args) File "/usr/bin/fboss_route.py", line 48, in add_route args.client, [UnicastRoute(dest=prefix, nextHopAddrs=nexthops)]) File "/usr/local/lib/python2.7/dist-packages/neteng/fboss/ctrl/FbossCtrl.py", line 11295, in addUnicastRoutes self.recv_addUnicastRoutes() File "/usr/local/lib/python2.7/dist-packages/neteng/fboss/ctrl/FbossCtrl.py", line 11317, in recv_addUnicastRoutes raise result.error neteng.fboss.ttypes.FbossBaseError: FbossBaseError( message='switch is still initializing, FIB not synced yet', message='switch is still initializing, FIB not synced yet')

capveg commented 6 years ago

Hi Lewis,

There's a bunch of work that has to be done to port to another platform. You need a new config.bcm (which you probably have), but you also need to create fboss/agent/platform/PLATFORM.cpp code and link it into the right places to get the phy programming and other sundry things correct. Let me write up a proper document for how to do this.

Lewis-Kang commented 6 years ago

Hi Rob,

Yes. I have the needed config.bcm. I use our built opennsl.so that works for AS6812-32X(running ONL) so the hardware port LED link status and color are all correct already.

I also install CLI tool and the needed python files to the switch, the CLI can run with correct Admin/State/Speed shown for cli.py port status command.

e.g., root@localhost:~# cli.py port status Port Admin State Link State Transceiver Speed ----------------------------------------------------------- 1 Enabled Up Unknown 10G ... 13 Enabled Up Unknown 40G

I think all I need now is to figure out how to add/modify FBOSS code to support AS6812-32X.

Looking forward to your guidance.

Thanks in advance.

-Lewis

capveg commented 6 years ago

So writing a formal doc for this has taken me longer than I wanted - apologies. Let me give you a slap dash answer that might unblock you.

For each new platform, you need the config.bcm (that gets passed as a command line option) and you also need to implement platform drivers in the agent (see the code in ./fboss/agent/platform/) as well as a platform driver for the qsfp_service (see ./fboss/qsfp_service/platform/).

There are a bunch of example platforms in those directories so hopefully the code will provide enough context.

Please let me know if you have more questions and I'll keep trying to get time to work on the platform.

bluecmd commented 6 years ago

Hijacking this issue a bit. In general, how open is Facebook to merge additional platforms? I'm interested in porting AS5712-54X, and I will probably so it anyhow - but I'm happy to upstream it if it's something generally FBOSS wants.

capveg commented 6 years ago

@bluecmd Sorry - missed this one. At one level, we'd love to accept the code. At another level, if it's a platform we're not actually using, we don't really have any decent way to test it. There's constant development going on with fboss and the likelihood that code for an untested platform would break is fairly high. We'd want to come up with some sort of external hardware-tested CI system before doing something like that and ... that just seems like a lot of work from where we are now.

Does that help?

bluecmd commented 6 years ago

Sure. What I'm hearing is that I'm probably better off just maintaining a fork of fboss right now for my HW support, until such a time comes where you're ready to accept and maintain new platforms.

capveg commented 6 years ago

You say "fork" - I say "pull request" that might take some time to merge :-) They're effectively the same thing in git and a pull request will be easier for others to find.

I'm curious what your larger interest is in FBOSS and which device(s) your thinking of porting it too? You mentioned AS5712-54X - any others? If you are going to support different platforms, it might be worth writing an FBOSS platform driver that uses (fboss/agent/platforms/...) for ONL's ONLP abstraction layer where possible. That way, you can leverage some of the existing work in terms of number of ports etc.

Thoughts?

On Thu, Jun 21, 2018 at 1:15 AM, Christian Svensson < notifications@github.com> wrote:

Sure. What I'm hearing is that I'm probably better off just maintaining a fork of fboss right now for my HW support, until such a time comes where you're ready to accept and maintain new platforms.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/facebook/fboss/issues/60#issuecomment-399016263, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGqwFTO02oiwbOnv8lRJIhoJXy9k9aaks5t-1YqgaJpZM4QeVQ2 .

bluecmd commented 6 years ago

The difference to me is the state they are in - my fork would be pretty experimental and not ready for merging until such a time comes when you folks know what kind of standard you want new platforms to adhere to. Nothing wrong with that, and I'm fine with having a long lived PR open if nothing else as an index of other platforms being worked on.

Nice idea with ONLP. I'd been thinking of adding Wedge support for it, so maybe an ONLP port could replace some Wedge specific code. Very interesting thought.

My larger picture is that I'm a fortunate hobbyist that has access to a couple of Wedge and an AS5712-54X. I'm also a bit tired of the kind of non-existent go-to solution for whitebox hobbyists. I think FBOSS+ONL could be that combination and I'd like to try to help out if I can.

However, I already did that once with the old OpenSwitch 1.0 by HPE that died, so I'm just making due diligence that Facebook appears committed to the open source version of FBOSS.