RENCI-NRIG / orca5

ORCA5 Software
Eclipse Public License 1.0
2 stars 1 forks source link

Interdomain modify problems #63

Closed ibaldin closed 8 years ago

ibaldin commented 8 years ago

Paul @paul-ruth please fill this in.

paul-ruth commented 8 years ago

I tried to recreate this bug but it is no longer possible to do so because there are a lot more fundamental bugs that were introduced during the recent update. The new bugs seem to mostly be related to broadcast links but are not limited to interdomain links.

Even the most basic broadcast link modify requests do not work.

Try the following:

  1. Start a slice with a node on site A connected to a broadcast link
  2. Add a second node to site A connected to that broadcast link.

The new node will not be added. The request silently disappears.

This bug halts our ability to make any progress on the Panorama work. We have been developing Panorama with intradomain slices while waiting for the interdomain bugs to be fixed. Now, we cannot even use intradomain slices to make progress on Panorama.

I have also noticed that that all interdomain broadcast links are embedded with paths that go through departure drive. This is true even if the broadcast link only contains nodes from 2 sites. This was not the case before last week's update. Although this is technically ok from the perspective of the user abstraction, it is a problem because circuits through departure drive will incur higher latency than necessary and we can only support a few circuits through departure drive.

ibaldin commented 8 years ago

Paul, thanks for filing this, we will look at it next week to get it fixed.

Regarding the below - if I understand correctly you want a two-party broadcast link to be a direct path, while a three or more party broadcast link naturally has to go through the departure drive. The problem is, in that case modifying a two-party broadcast into a three-or-more party broadcast will be a break/make type (we have to atomically tear down existing direct path and replace with one that goes through the departure drive, so it can then be modified to add more parties). That’s a bit too complex for the controller to support natively right now. At this point my inclination is not to worry about it on the controller side, as this should be something that can be programmed via AHAB.

On Jul 7, 2016, at 8:59 AM, paul-ruth notifications@github.com<mailto:notifications@github.com> wrote:

I have also noticed that that all interdomain broadcast links are embedded with paths that go through departure drive. This is true even if the broadcast link only contains nodes from 2 sites. This was not the case before last week's update. Although this is technically ok from the perspective of the user abstraction, it is a problem because circuits through departure drive will incur higher latency than necessary and we can only support a few circuits through departure drive.

paul-ruth commented 8 years ago

Please note that I said broadcast links that involve 2 sites (not 2 nodes). A very common broadcast link would involve many VMs at only two sites. An efficient embedding of this case should not go through departure drive.

However, this should also apply to interdomain broadcast links that involve only 2 nodes. Another very common usecase is to add/remove nodes to interdomain broadcast links where the nodes are only added to sites where the broadcast link already exists. An efficient embedding of this case should also not go through departure drive.

Fundamentally, I don't think that the lack of the ability to modify a broadcast link requires a limitation of always using departure drive. If anything, this lack of ability enables the embedding algorithm to embed in any way it wants because an attempt to modify would result in a warning/error to the user that says that type of modification is not yet possible. One of the primary goals of the embedding algorithm should be the efficient use of resources. Since departure drive resources are among the most valuable, the embedding algorithm should avoid their use whenever possible.

I think this is yet another issue that supports separating the embedding of networks/links from embedding nodes. Interdomain networks/links connect sites not nodes. The embedding of an interdomain networks/links should only depend on the number of sites it connects. The number of nodes at each site should not affect the embedding of the network.

ibaldin commented 8 years ago

I think these are all very good points, thank you for putting it on paper so we can discuss in more detail.

YufengXin commented 8 years ago

I checked in the fix, and tested in emulator to some extent.

Please further test it.

Yufeng Xin, PhD RENCI UNC at Chapel Hill 1-919-445-9633 yxin@renci.orgmailto:yxin@renci.org

On Jul 7, 2016, at 11:29 AM, Ilya Baldin notifications@github.com<mailto:notifications@github.com> wrote:

I think these are all very good points, thank you for putting it on paper so we can discuss in more detail.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/RENCI-NRIG/orca5/issues/63#issuecomment-231114118, or mute the threadhttps://github.com/notifications/unsubscribe/AHPA5id4UoGtuqdz8viIaZcyjZm03Dlhks5qTRtpgaJpZM4JDeRw.

ibaldin commented 8 years ago

Paul, how would Mert test it in emulation using your code - any instructions?

paul-ruth commented 8 years ago

I can show him. Anirban is already using it. The only change would be the url to the targeted controller.

Mert, do you have time tomorrow?

ibaldin commented 8 years ago

@mcevik0 Mert take a look at this thread - Yufeng made some code changes, it would be great to test them some more in emulation VM before deploying. Paul has some test code that exercises what he wants.

mcevik0 commented 8 years ago

I'll see Paul tomorrow (07/13) in the morning and do the testing.

YufengXin commented 8 years ago

The particular use case that Ilya told me yesterday (from Paul) actually worked: 1. start with a point-2-point inter-rack link, 2. then add additional VMs (bounded to the same rack) hanging off a xNet switch. There is a problem with the manifest to forbid it from showing correctly in Flukes, though.

Adding VMs from a different rack should work in principle, but needs some more tests/maybe minor bugs.

Yufeng Xin, PhD RENCI UNC at Chapel Hill 1-919-445-9633 yxin@renci.orgmailto:yxin@renci.org

On Jul 12, 2016, at 5:25 PM, mcevik0 notifications@github.com<mailto:notifications@github.com> wrote:

I'll see Paul tomorrow (07/13) in the morning and do the testing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/RENCI-NRIG/orca5/issues/63#issuecomment-232186507, or mute the threadhttps://github.com/notifications/unsubscribe/AHPA5oAKNmgV_dTXfuk9Wc6NqRJMjY4fks5qVAZUgaJpZM4JDeRw.

ibaldin commented 8 years ago

Intradomain works.