RoboCupAtHome / RuleBook

Rulebook for RoboCup @Home 2024
https://robocupathome.github.io/RuleBook/
Other
142 stars 60 forks source link

Network Setup 2019 | The Evergreen Topic #363

Closed warp1337 closed 1 year ago

warp1337 commented 6 years ago

Hi all, I would like to discuss/improve/chart the network setup for Montreal. Our advantage over the last years is def. that Nagoya's LOC deployed a smoothly working and incredibly well documented setup already. So, who needs to be added to this discussion?

Let's start the discussion by incorporating the knowledge from 2017 #173

@LoyVanBeek @kyordhel @balkce

warp1337 commented 6 years ago

Can someone tell me who I need to add to reach the LOC? @LoyVanBeek @balkce @kyordhel

moriarty commented 6 years ago

@JeffCousineau could you tag the LOC here?

maximest-pierre commented 6 years ago

It is quite early to determine the network, but we have someone that is in charge of the network. . We can either transmit all your requirements to him or give him his contact.

warp1337 commented 6 years ago

Thanks @moriarty @maximest-pierre . You are right, it is indeed quite early ... BUT the network setup is usually something that can either make people VERY happy on location or, in contrast, can make people very indignant --- leading to constant complaints directed towards the LOC ;) Trust me I've been there.

However, I have taken part in the RoboCup since 2010 (OPL and last year SSPL/OPL both) and I 'witnessed' different setups showcasing extremely different performances over the last years. The point I am trying to make here is, that I a) would not undervalue the importance of a working network infrastructure b) The LOC in Nagoya made an extremely good job, I strongly recommend adopting their setup. c) Especially for the SSPL and DSPL a stable WiFi is essential because these leagues make heavy use of external (local network!) computing. WRT c) I'd like to clarify that I don't talk about web API's or any other cloud computing (actually, I personally don't care about the uplink to the www) --- I am talking about a robot with limited computing capabilities talking to a laptop (or whatever) via WiFi. In order to do that, a working network is essential (Am I right? @justinhart, what about the DSPL wifi experience/requirements?).

That being said, I would like to make a few more adjustments to the 'Nagoya setup' and the rules that were enforced on location (e.g. own wifi's were strictly prohibited, as well as wireless keyboards, or any other kinds of wireless transmitters ... you name it.) but that's something we could discuss later.

Please don't get me wrong. I know organizing this is looooooots of hard work. However, I believe that, by adopting the Nagoya setup, one can actually make the LOC's life easier because it already implements a great deal of best practices.

So here's the Nagoya 2017 network guideline https://github.com/RoboCupAtHome/RuleBook/files/993340/RoboCup2017_WirelessGuideline_03.20170425._extracted.pdf

I would also strongly suggest to read through our 2017 discussion in #173

Thanks!

warp1337 commented 6 years ago

Oh, BTW, @HideakiNagano21 was part of the LOC in Nagoya, I guess he's the guy your sys ops should talk to.

justinhart commented 6 years ago

In my conversations with Toyota, they have emphasized that the HSR is designed around the idea that a good LAN is important to provide offboard compute capabilities to the robot. I agree on this point The internal network seemed adequate for last year's competition. I think that, looking forward, we should consider even expanding this if possible. Northeastern had problems, and, yes, there are engineering solutions to their problems, but I only see the amount of network communication going up. The HSR has 4 potential color feeds, a lidar that talks internally over ethernet, depth images, and no way to beef up the compute infrastructure inside the robot, in an academic environment that is pushing more compute-heavy resources like deep learning. I say, look at last year's internal network as a victory, but turn a cautious eye to the amount of network bandwidth potentially required for next year.

I know that this conversation primarily concerns the internal network, but the external network really needs discussion. We were pulling something like 1kbps to the outside world last year, and this was a huge problem for us. We need to carefully consider that the HSR and Pepper are both managed partly off of central software repos. My team owns 2 HSRs. Last year when we wanted to sync something that had been synced on our other HSR, an operation that would have been trivial on my home network, it became an all-day ordeal. If I recall, the solution ended up being going offsite and copying files to a USB drive. I know that next year, we'll probably be trying to send training data to a cluster back at UT. So, definitely agreed on the point that you're discussing, but I'd like to see bandwidth to the internet improved.

maximest-pierre commented 6 years ago

I will take these point to our next LOC meeting. I understand the SSPL and DSPL are going to need a lot of bandwidth. I will be updating this thread when I get more information about the networking from our network guy.

warp1337 commented 6 years ago

@maximest-pierre @justinhart . Perfect. Thank ya'll.

One last comment. If you talk to your network guy, ... this year we had one SSID per team, which was a good choice. The SSID/net was used for testing and also during the tasks. Each team was isolated in its own VLAN [must have, you don't want to connect to someones ROSCORE by accident ;)].

However, another problem we experienced was the varying bandwidth availability over the day. In the morning and afternoon when everybody was testing or hacking stuff in the team area, not that much bandwidth was used. You tune your system at one point, e.g, "okay I have ~1 MB/s let's stream 5 FPS color images, 10 Hz laser, etc.. etc..". The closer the upcoming task, the lower the available bandwidth because suddenly everybody was streaming images, laser scans, you name it. Now, it was not ~1 MB/s but maybe 200 KB/s (I made those numbers up, I cannot remember the actual ones).

We also observed quite a difference in the general off-loading approach. My team put lots of work into compressing streams, make the communication as efficient as possible, while other teams simply streamed everything uncompressed (the brute force approach) even if not required for the current task. Eventually, that made everyone's life harder because people kept on streaming things even it was not their test slot or the robot was resting (!) aside the arena... You can tell people in the team leader meeting: "Please keep in mind that there are 8 teams and if you don't need the bandwidth stop streaming after the task". That's okay, but in the heat of the moment people tend to forget these things. IMHO.

I am proposing the following for 2018:

**Fixed** bandwidth for your team = TOTAL_AVAILABLE_BANDWIDTH / NUMBER_OF_TEAMS. 

Limiting the bandwidth per team (ideally on hardware level) would be the ideal case because that's simply fair and your system setup becomes way more predictable.

LoyVanBeek commented 6 years ago

Limiting bandwidth per team is a good idea, for the rest I would like to keep the Nagoya2017 setup with perhaps more bandwidth going externally.

justinhart commented 6 years ago

I actually liked the idea of having exclusive access to the arena network during competition, and having teams on practice networks when not in competition. If, however, we could extend this to simply having one essid for each team and that's what they use for both practice and competition, that seems even better. Then there's no hurried reconfiguration of the network between rounds.

Justin

LoyVanBeek commented 6 years ago

Ideally, there's exclusive bandwidth for 1 team during challenges and evenly distributed bandwidth during training. The hassle of arranging this is not manageable though, so then I would opt to keep the bandwidth limited at all times, also during competition.

maximest-pierre commented 6 years ago

Last night, we had an LOC meeting, I took all the points that were written in this thread and presented them to the rest of the LOC. We are going to have another more technical meeting with the venue for everything networking and power. This meeting will be in about 2 weeks.

warp1337 commented 6 years ago

Awesome. Thanks. Please keep us posted.

kyordhel commented 6 years ago

Off topic so I'll delete this later.

@maximest-pierre, could you please send me the details (name, affiliation, email) of the @Home LOC to add them to the OC? It would be handy if they are up to date on what is happening in the OC (number of accepted teams, requirements, etc).

Thanks in advance

kyordhel commented 6 years ago

I was wondering whether it can be done with a radius server or not.

When I was in bachelor I set up a small radius server for 128 machines in theory (we never used more than 12 in practice, all downloading torrents through the uni gateway). It had radius MAC filtering and a scheduler so dynamically it was able to disconnect certain clients, keep them off, and allow them to reconnect later. Also I remember reading (that part was not used) it was possible to set a bandwidth quota and limit to each mac address.

Setback is that we would need tight coordination between the OC and the Network people, so they could white-list the competitor while at the same time banning the rest. I think this can be done with a web interface and some python/bash scripting.

warp1337 commented 6 years ago

bump Any updates from the LOC?

@kyordhel Nice idea, but I agree, hard to enforce and maintain during the actual competition. Limiting the bandwidth at hardware-level for each subnet should also be possible though.

I'd like to remember all of you that switching networks, including the tablet, on the Pepper is not really reliable/robust, e.g, if you have two networks your list, sometimes Pepper switches between them randomly ... I bet this will cause serious panic attacks on site ;)

maximest-pierre commented 6 years ago

I got no update right now for the network for @home but tomorrow we are discussing the network in general.

RemiFabre commented 6 years ago

Hi, Any news on this topic? Thanks,

warp1337 commented 6 years ago

As discussed earlier in this thread I am more interested in the local connection bandwidth, i.e., from the robot to an external computing device. Again, have a look at this document

warp1337 commented 6 years ago

What's the escalation of bump?

LoyVanBeek commented 6 years ago

Bump++ @maximest-pierre any updates?

maximest-pierre commented 6 years ago

We are currently having problems with the venue on the networking side. The venue is not used to have this much network traffic on its infrastructure. I hope that this issue will be resolve soon.

warp1337 commented 6 years ago

Thank you for the feedback! Fingers crossed.

kyordhel commented 6 years ago

No venue is, ever. When Jesus Savage organized RC in 2012, the contact person for the venue (the biggest convention centre in Mexico City) went WTF after checking the requirements for Internet, 4 times higher than the max capacity of the venue's infrastructure. They are used to host suits and tuxedos, not over 4000 tech freaks much hungrier of bandwidth than of food.

warp1337 commented 6 years ago

Besides the outbound connection, @maximest-pierre can you make a comment on the local network? Did you have a look at the Nagoya setup?

warp1337 commented 6 years ago

bump

maximest-pierre commented 6 years ago

I finally got some news. The local network is 1Gb/s switch on local. The outbound connection is going to be 10 Gb/s. We are also finalizing the plan for the wifi, IP assignment, and VLAN assignment.

Also, is somebody bringing a server with them? I would like to know beforehand to set up electrical and networking for it. (I will probably send an email to the mailing list later)

justinhart commented 6 years ago

Probably several teams will bring servers. I would count on almost 1 per team.

kyordhel commented 6 years ago

@maximest-pierre please define Server. Hardware-wise, many gamer computers won't qualify as server, but their sustained power consumption would (e.g. i7 with two nVidia Quadro in bridge). I remember at least team Tech United Eindhoven and team AUPAIR having several desktops on their tables.

And servers won't be our only electrical problem. We need support for

@justinhart Can you give us the power consumption of the HSR?

justinhart commented 6 years ago

It's written in Japanese. I'll email Toyota.

Justin

maximest-pierre commented 6 years ago

What I mean by server are the bulky power hungry server kind 1000w and more.

The planned available power for a team is 120V 15A 60Hz (1800W)(Might change). Wireless is not allowed in the venue.

Right now we are planning to only give 1 Gbits per table.

That's all the information that I have for now. Also, things might change.

warp1337 commented 6 years ago

@maximest-pierre Thanks for the update! Will you be able to limit the bandwidth for the arena network per team/vlan? That info would be great! I would also vote for a hard limit per team.

kyordhel commented 6 years ago

For the record, this is doable with freeRadius and a Mikrotik Router OS:

  1. Setup the user/pass in radcheck table, e.g. INSERT INTO radcheck VALUES ('','warp1337','Cleartext-Password',':=', 'warpass');
  2. Setup the up/down bandwidth limit (speed) in radreply using the same user name as in radcheck. e.g. INSERT INTO radreply VALUES ('','warp1337','Mikrotik-Rate-Limit',':=','56M/100M');

Former example should limit the hard bandwidth usage of @warp1337 to 56M up and 100M down (on server side excluding noise, channel could use slightly more). There are also means to set a max overall usage quota.

Is important to remark that the router's OS/firmware must support setting a bandwidth limit. If this is not the case, the Radius will only grant (or not) access, but no limit will be imposed.

warp1337 commented 6 years ago

Thanks, good to know.

warp1337 commented 6 years ago

@maximest-pierre Can you give us an update? Actually, we are currently testing a setup for the German Open. If you are interested in the topology and the hardware we bought let me know.

maximest-pierre commented 6 years ago

It's going well the network topology is being finalized right now. It should be ready in the next week or so. Yes, I am interested in the topology and the hardware to compare it to what we have.

warp1337 commented 6 years ago

Alright! We bought a Unify EdgeSwitch (24 ports) that hosts 9 VLANs. Additionally, we bought 3 Unify AP Pros (https://store.ubnt.com/products/unifi-ap-pro). Each AP provides 3 wifi SSIDs that are connected to a dedicated VLAN (one per team). We limit the bandwidth per SSID to 10 mb/s up and download. All SSIDs have internet access which is routed by a Cisco consumer level router (also connected to the switch, std. gateway).

We did some tests today and are confident that the setup will do just fine. I will gather some data wrt total inter-/intranet traffic during the German Open (9 Teams at OPL/SSPL).

Edit: I forgot to mention, this setup is just for the OPL/SSPL

warp1337 commented 6 years ago

Here's a quick update: we deployed the setup as described above in Magdeburg at the German Open 2018. 8 teams used the setup for 5 days. We didn't have a single outage or major complaint during the competition. However, we got three channels [36,52,56] assigned by the LOC on day 1, which was quite important, IMHO, concerning the quality and robustness of the wifi. The maximum number of clients connected to the three APs at the same time: ~40 machines.

I may post some more detailed info in the future, but the overall traffic after 5 days was > 600 GB (!) After having talked to the teams most of the traffic is related to scp'ing compiled code onto the robot and streaming images (rgb and depth, in our case). However, if desired, I can also put some effort into splitting up the traffic by, e.g., cloud API calls and local traffic. As usual, we found several 'inofficial' wifi networks, I am not talking 1 or two, but more like 10. Even after putting flyers saying "it is not allowed to setup own wifi" those networks kept on broadcasting. Luckily, those 'wifi pirates' seemed to be at least aware of channeling since they didn't use any of the already used channels, but you def. cannot rely on that... Okay, that's it for now, I guess.

@maximest-pierre @LoyVanBeek @kyordhel @balkce @justinhart @fabricejumel

warp1337 commented 6 years ago

@maximest-pierre Can you give us an update on the network situation?

Atine commented 6 years ago

Hi, do we have updates on this issue?

warp1337 commented 6 years ago

bump

warp1337 commented 5 years ago

I guess we all have experienced how important this issue actually was. Please, also look at the initial post date of this issue (Sep. 2017) I guess we can close this issue now. However, I strongly suggest to write down the lessons learned somewhere.

LoyVanBeek commented 5 years ago

Go ahead with writing those lessons. I'll start:

I don't like complain without also proposing some solutions:

RemiFabre commented 5 years ago

We could setup the 4G plan for 2019:

384

Do we have someone from the loc to talk to?

There seems to be 3 network providers: https://www.whistleout.com.au/MobilePhones/Guides/Best-phone-plans-for-travellers-in-Australia

Telstra seems to be the best? 59 dollars for 4GB for a prepaid phone. https://www.telstra.com.au/mobile-phones/prepaid-mobiles/plus-packs#freedom-plus

warp1337 commented 5 years ago

We should write this down in a dedicated document, including best-practices and lessons learned targeted at OC, LOC etc ...

fabricejumel commented 5 years ago

Florian, could you send me your adress mail fabrice.jumel@gmail.com

Regards

moriarty commented 5 years ago

@warp1337 Close this issue? Or just edit the title to 2019?

warp1337 commented 5 years ago

Yo! @johaq just informed me that we see ~290GB traffic after 2 days at the German Open 2019. I believe there are 9 teams.

LoyVanBeek commented 5 years ago

Some 20GB per team per day...

johaq commented 5 years ago

I will post stats for the whole competition after the competition next week.