[RM] Long Term requirements for SDN, Fabric and Networking

ASawwaf commented 4 years ago

as discussed in RM call ( Feb 5th ) , we need to have very solid requirements for SDN, Fabric and Networking part that should be included in next stable release as we believe it is mandated in our RAs

So I will amend what is discussed over the mail ( Just brain dump ) and let roll the ball from here :

Define a Terminology
Underlay and overlay Protocols
How to achieve full Programmability fabric and automated Fabric, do we need SDN ?
What is neutron limitation in Openstack and what is Networking limitation in K8s
CSP requirement and use cases ( The Gaps in Vanilla Openstack ) o Service Function chaining ( SFC ) and how to achieve it o Underlay and Overlay Automation & Programmability o Security o VNF layer 3 requirements o In Case of SRIOV, Fabric automation
Will Fabric be the same regarding VNF / CNF in terms of Protocols?
How can we generalize a Fabric and SDN-C ?

https://etherpad.opnfv.org/p/ZoopMgLK61

ASawwaf commented 4 years ago

Some Termonlogies should be defined : "please add whatever you see I missed it"

IP Fabric
Layer 2 Fabric
Fabric Management
Overlay
Underlay
Overlay Protocols
underlay Protocols
VxLAN
EVPN
SRv6
SFC
NSH
Netconfig
Yang
OVS / VR
XMPP
Openflow
Data Center SDN

walterkozlowski commented 4 years ago

@ASawwaf, @kedmison, @sukhdevkapur, @rabiabdel, @BernardTsai-DT, @pgoyal01, @TFredberg, @tomkivlin , @petorre

Please find my first draft for the Networking section in RM. Comments please! +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The purpose of this section is to provide a generic architectural pattern for the CNTT networking and hence to define common networking infrastructure components and their interactions with other parts of the CNTT Reference Model such as Compute and Storage. CNTT is working with other standard bodies such as ONF (Open Networking Foundation) to agree on a common approach towards CNTT networking.

We consider CNTT networking, also referred to as Network Fabric, as the evolution of the SDN (Software Defined Network) concept in line with the evolution of virtualised/containerised networking. In particular, the term Programmable Fabric refers the Network Fabric that allows the full stack, all the way down to the forwarding pipeline, to be programmed or defined in software. Programmable Fabrics are the inevitable next step in the next generation of networks as telecommunications services providers look to make their networks: more efficient, more open, multi-tenanted, traffic engineered, INT (In-Band Telemetry) enabled, VNF offload enabled.

We base our approach on the CUPS (Control User Plane Separation) methodology and hence we introduce the following fundamental notions:

SDN - Control Plane (SDN-CP) The SDN-CP (control plane) layer is responsible for configuring and managing the data plane and is typically more centrally located (i.e. like one per PoP or region), although could also live on the switch or compute hardware.

SDN - enabled Data Plane (SDN-eDP) The data plane does the bulk of the network traffic forwarding, sending exception or control packets up to the control plane for processing (i.e. DHCP for a new IPoE session). SDN-eDP may be deployed in many different ways. While the data plane might normally be thought of as a standalone network switch in the network it could also be a fully virtualised function running within compute core, or a SmartNIC in a compute server that allows the fabric to be extended up into the server (for instance, in case of programmable fabric it would be using P4 language to define a pipeline in an FPGA SmartNIC).

The following diagram presents the scope of CNTT Networking shown as a blue box against the standard ETSI NFV model. The diagram, in conjunction with the CNTT Reference Model, will identify integration points relevant from the CNTT Networking perspective. CNTT RM, RA and RI documents will specify the detail requirements for CNT Networking standard.

ETSI

The next diagram presents the high-level model for CNTT Networking. HL Arch

SDN – Control Plane layer

Fabric Controller controls the loading of the data plane pipeline using Network Fabric Interface into Forwarding Engine (FE). In case of Programmable Fabric, Fabric Controller controls also programming of the data plane pipeline using the P4 Runtime interface to communicate with the data plane’s Programmable Forwarding Engine (PFE).

Telemetry Controller allows applications (e.g. Fault Management) to collect telemetry on the network elements in the fabric using Network Fabric Interface. It is expected that other applications will use machine learning to provide more intelligent control loop feedback back into the Fabric Controller applications to provide pre-emptive service configuration and repair. Configuration and Management Controller provides applications and common north bound interfaces and models for the configuration and management of the Network Fabric.

SDN – enabled Data Plane layer Data Plane Node (DPN is used to describe the hardware that houses the data plane forwarding function. This could be a stand-alone network switch with a FE like Tofino or a compute server with a FPGA based SmartNIC. Data Plane-Agent (DP-Agent) provides the standardised north bound fabric interfaces (e.g. in case of Programmable Fabric: P4 Runtime, gNMI and gNOI) that allow the control plane controllers to communicate with the data plane. An example implementation of the DP-Agent is ONF’s Stratum project. Forwarding Engine (FE) represents the actual hardware that does the packet forwarding (also called the packet pipeline). In case of Programmable Fabric, FE is referred to as PFE (Programmable Fabric Engine). Some examples of a PFE could be the P4 based switch chipset like Intel/Barefoot’s Tofino chipset, or another could be an FPGA based SmartNIC using P4 to define the packet forwarding pipeline. FE Pipeline describes the forwarding logic of packets in the data plane nodes of the Network Fabric. The most widely used and standardised packet pipeline language used in the network domain today is the P4 language (see https://p4.org for detailed information). At a very high level, the P4 language enables programmers to define the logic used to process packets that are received by the Programmable Forwarding Engine (PFE) normally using the concept of tables to allow this state to be programmed by control plane programs (see Fabric Controller definition above) dynamically and as needed (i.e. when a new subscriber is instantiated into the network).

Deployment - Overlay and Underlay Networks The typical deployment issue with NFV is that NFV solutions tend to leverage overlay networks and lack a proper visibility into the underlay networks. This has usually a negative impact on performance, typically latency and jitter, and troubleshooting capability, leading to the necessity of hardware pining and use of various acceleration techniques like SRIOV, which in turn undermines the principles of virtualisation. The evolving technologies that allow offloading the data plane components of network functions into a Programmable Fabric seem to be the natural way of saving benefits of virtualisation for complex control plane components, eliminating performance bottlenecks of traditional NFV solutions, and promoting the technology innovation and hardware re-use.

ASawwaf commented 4 years ago

Many thanks @walterkozlowski for this great effort to formulate an initial contents

I make small changes, to make it more generic rather than ONF ( ONOS) and I will work more on it

++++++++

I attached a document containing the changes

The purpose of this section is to provide a generic architectural pattern for the CNTT networking and hence to define common networking infrastructure components and their interactions with other parts of the .docx

please have a look @walterkozlowski @sukhdevkapur @rabiabdel @karinesevilla @peterwoerndle

walterkozlowski commented 4 years ago

Thank you @ASawwaf for your comments. Good points, I have accepted them with some minor editorial changes. The text is below. Please comments from: @kedmison @sukhdevkapur @rabiabdel @kagreenwell @peterwoerndle @petorre @BernardTsai-DT @TFredberg @tomkivlin @pgoyal01 ++++++++++++++++++++++

The purpose of this section is to provide a generic architectural pattern for the CNTT Networking and hence to define common networking infrastructure components and their interactions with other parts of the CNTT Reference Model such as Compute and Storage. CNTT is working with other standard bodies such as ONF (Open Networking Foundation), LFN (ODL / Tungsten fabric) to agree on a common approach towards CNTT networking.

We consider CNTT Networking, also referred to as Network Fabric, as the evolution of the SDN (Software Defined Network) concept in line with the evolution of virtualised/containerised networking. In particular, the term Programmable Fabric refers to the Network Fabric that allows the full stack, all the way down to the forwarding pipeline, to be programmed or defined in software. Programmable Fabrics are the inevitable next step in the next generation of networks where CSP look to make their networks: intent based, more efficient, more open, multi-tenanted, traffic engineered, INT (In-Band Telemetry) enabled, which achieve seamless onboarding of any VNF/CNFs .

One of the key SDN principles is the separation of control and forwarding planes. We base our approach on the CUPS (Control User Plane Separation) methodology and hence we introduce the following fundamental notions:

SDN - Control Plane (SDN-CP) The SDN-CP (control plane) layer is responsible for full automation of configuration and managing the data plane and is typically more centrally located (i.e. like one per PoP or region), although could also live on the switch or compute hardware. SDN - enabled Data Plane (SDN-eDP)

The data plane does the bulk of the network traffic forwarding, sending exception or control packets up to the control plane for processing (i.e. DHCP for a new IPoE session). SDN-eDP may be deployed in many different ways. While the data plane might normally be thought of as a standalone network switch in the network it could also be a fully virtualised function running within compute core, or a SmartNIC in a compute server that allows the fabric to be extended up into the server (for instance, in case of programmable fabric it would be using P4 language to define a pipeline in an FPGA SmartNIC).

The following diagram presents the scope of CNTT Networking shown as a blue box against the standard ETSI NFV model. The diagram, in conjunction with the CNTT Reference Model, will identify integration points relevant from the CNTT Networking perspective. CNTT RM, RA and RI documents will specify the detail requirements for CNT Networking standard.

The next diagram presents the high-level model for CNTT Networking.

SDN – Control Plane layer Fabric Controller controls the loading of the data plane pipeline using Network Fabric Interface into Forwarding Engine (FE). In case of Programmable Fabric, Fabric Controller controls also programming of the data plane pipeline using many protocols such as Openflow, XMPP or P4 Runtime interface to communicate with the data plane’s Programmable Forwarding Engine (PFE). Telemetry Controller allows applications (e.g. Fault Management) to collect telemetry on the network elements in the fabric using Network Fabric Interface. It is expected that other applications will use machine learning to provide more intelligent control loop feedback back into the Fabric Controller applications to provide pre-emptive service configuration and repair. Configuration and Management Controller provides applications and common north bound interfaces and models for the configuration and management of the Network Fabric. SDN – enabled Data Plane layer

Data Plane Node (DPN) is used to describe the hardware that houses the data plane forwarding function. This could be a stand-alone network switch with a FE like Tofino or a compute server hosted OVS or VR, or it can be compute server with an FPGA based SmartNIC.

Data Plane-Agent (DP-Agent) provides the standardised north bound fabric interfaces (e.g. in case of Programmable Fabric: P4 Runtime, gNMI and gNOI) that allow the control plane controllers to communicate with the data plane. An example implementation of the DP-Agent is ONF’s Stratum project.

Forwarding Engine (FE) represents the actual hardware that does the packet forwarding (also called the packet pipeline). In case of Programmable Fabric, FE is referred to as PFE (Programmable Fabric Engine). Some examples of a PFE could be the P4 based switch chipset like Intel/Barefoot’s Tofino chipset, or another could be an FPGA based SmartNIC using P4 to define the packet forwarding pipeline.

FE Pipeline describes the forwarding logic of packets in the data plane nodes of the Network Fabric. The most widely used and standardised packet pipeline language used in the network domain today is the P4 language (see https://p4.org for detailed information). At a very high level, the P4 language enables programmers to define the logic used to process packets that are received by the Programmable Forwarding Engine (PFE) normally using the concept of tables to allow this state to be programmed by control plane programs (see Fabric Controller definition above) dynamically and as needed (e.g. when a new subscriber is instantiated into the network).

Deployment - Overlay and Underlay Networks

The typical deployment issue with NFV is that NFV solutions tend to leverage overlay networks and lack a proper visibility into the underlay networks. This has usually a negative impact on performance, typically latency and jitter, and troubleshooting capability, leading to the necessity of hardware pining and use of various acceleration techniques like SRIOV, which in turn undermines the principles of virtualisation. The evolving technologies that allow offloading the data plane components of network functions into a Programmable Fabric seem to be the natural way of saving benefits of virtualisation for complex control plane components, eliminating performance bottlenecks of traditional NFV solutions, and promoting the technology innovation and hardware re-use.

ASawwaf commented 4 years ago

Thanks @walterkozlowski

oyayan commented 4 years ago

Thanks @walterkozlowski @ASawwaf . I think it would be good to have some modelling for the physical network fabric. There are references to Network Fabric in the above, could be defined and separated clearly as physical underlay&overlay. I have seen physical fabric topology presented under RA1 Networking which could be moved under modelling as physical underlay should be common for both RA1 and RA2. Then the standards, like how to peer neutron and NSX with the physical underlay can be architectured in the RA Networking for both technologies? What are your thoughts?

ASawwaf commented 4 years ago

@oyayan , Did you refer to " https://github.com/cntt-n/CNTT/blob/master/doc/ref_arch/openstack/chapters/chapter03.md#3422-network "

if yes , so totally agreed to push it on RM as common for both RAs ,

and for "how to peer neutron and NSX with the physical underlay " it is a tough question as neturon can't do it right now

@pgoyal01 @rabi-abdel do you have any comments ? to move it?

oyayan commented 4 years ago

@ASawwaf , yes that is the chapter i refer to.

oyayan commented 4 years ago

@ASawwaf , @walterkozlowski , also the some parts of section 4.2.3 in ch4 can be moved and completed in RM. Thoughts?

ASawwaf commented 4 years ago

@oyayan , it is Openstack networks mapping to the fabric, but we need to have a deep look on it

but if u don't mind can we work to figure out the clauses that RAs agnostic and should be reflected in RM as above?

walterkozlowski commented 4 years ago

@oyayan @ASawwaf @BernardTsai-DT @peterwoerndle @petorre @kedmison @sukhdevkapur I agree with Oya regarding more contents needed for the physical topology side of the things (e.g. use of CLOS architecture for scaling) and will add few paragraphs. At this stage we do not want to be too prescriptive though, this is the h/l model establishing major entities. This should lead to requirements when for example scalability will be identified. We will need to decide how much of Requirements we will need to have in RM versus RA1 and RA2. My suggestion however is focus first on the h/l model and major entities. So please send your comments to the contents previously posted here (so far Ahmet and Oya did it).

pgoyal01 commented 4 years ago

@ASawwaf @oyayan I have no issues moving " https://github.com/cntt-n/CNTT/blob/master/doc/ref_arch/openstack/chapters/chapter03.md#3422-network " -- but would like to echo @walterkozlowski please remember that the purpose of the RM is a model (not the implementation aspects). @walterkozlowski mentioned the CLOS and that is possible to specify as a hierarchy of network elements and the nature of the transport (E-W, N-S, etc.). The model so far mentions FG but I don't think the resultant "path" including the components that make up that path (sorry if I missed)

TFredberg commented 4 years ago

I would like to enter Networking in the RM on an even higher level of abstraction than the current proposal.

I feel it´s important to view NFVI as a layered model that allows multiple Virtualization Layer domains on top of a shared HW Infrastructure, at least the RM shall allow implementations of multiple separated SW Virtualization layers and all of these should be possible to be managed by separate organizations.

With this said I suggest that we first agree on a couple of cornerstones and below I suggest a few so that we can discuss them in this forum. For the purpose of this discussion I have also made an ETSI NFV inspired figure that depicts a plausible deployment in a Telco organization that is during a rather long migrating journey from VNFs to Cloud Native CNFs.

The implementation of Networking inside the HW Layer should not be visible to the VNF/CNF and should preferably not even be visible to the IaaS/CaaS.

In cases where a VNF/CNF require HW layer resources it should be under the control of the Virtualization Layer.

It is important that the HW Infrastructure Manager, each VIM and each VNF/CNF could be managed by separate organizations.

The Responsibilities of each layer in NFVI are:

HW Infrastructure Manager shall provide an abstracted model of the allocated HW resources into each specific Virtualization domain
- On networking it is also responsible to keep logical separation in between different instances of virtualization domain
- Some of the HW resources including networking resources will be withheld from Virtualization domains to allow for scaling, spare parts and HW Composition within the HW Infrastructure Layer itself
Virtualization layer shall provide the Cloud Tenants with an abstracted networking environment
- It is therefore responsible to keep logical separation in between different Cloud Tenants

An example of the above layering is a Virtualization layer managing the Overlay Networking (e.g. through VLAN allocation) and the HW Infrastructure managing the Underlay networking (e.g. through VxLAN VNI range allocations).

petorre commented 4 years ago

Beyond more strict layering that @TFredberg describes, I would like RM Networking text to:

Allow for NF components to be deployed in many different ways depending on target use case requirements, workload characteristics (different algorithms implementing pipeline steps) and available platforms.
Not combine into one node what is A. server node with programmable acceleration which is already in RM+RAs together with B. programmable switches which should be incremental focus.
Model which Functionality and Management Interfaces those layers are exposing towards reaching Programmable Fabric, while being specific that layers are not jumped or Exception needs to be written.
Less describe example protocols or products on How this can be implemented.

markshostak commented 4 years ago

@walterkozlowski @ASawwaf, For the initial Baldy content, I think @TFredberg and @petorre are on the right track, in that they have started to define the Objectives and Requirements. It's important we define the problem, before we design the solution. :-) Some or all of that content may be appropriate for the RM, or may even be Tech Steering purview (@rabi-abdel , can you comment?). I'm on the fence about it.

@walterkozlowski To your point, it's too soon to get prescriptive. The material you wrote, which for me seems more informative, puts it in Tech Steering purview, so we should consider targeting it for Tech (@rabi-abdel, do you see Walter's material in your Tech doc?), and to Tomas' point, thereby makes the RM more abstract. Do you concur?

Again, highly suggest we define what we're trying to accomplish/deliver, and why, first. Then work on the how.

markshostak commented 4 years ago

All, As discussed on the call, the following 3 categories are excerpted from today's live RM mtg minutes:

1. Executive Summary 2. Initial Objectives 3. Initial Mechanics

Let's start fleshing them out.

Does anyone not believe we can deliver a summary and 2 initial lists, in time for Baldy? I think between the content in this thread and today's agenda, we already have a lot of it. Does anyone want to add addition scope for Baldy?

walterkozlowski commented 4 years ago

Hi All, I missed the RM meeting this time (2am in Melbourne...), but I agree we can do this for Baldy. I will keep working both on the contents base (taking into consideration comments from @ASawwaf, @petorre, @TFredberg, @oyayan - very pertinent), but will focus on the Summary, and Initial Objectives and Mechanics. I wills send out the next version over the weekend.

@markshostak, @TFredberg, @petorre as I presented in Prague, I have Objectives and Drivers and general layers defined as well, so will incorporate them into this initial content. They are known!!! I have included here some technical details and examples of protocols etc to get some real facts (these things really exist and can be implemented). But obviously I agree that in the final RM contents and in Baldy presentations we will talk on a bit more generic level.

markshostak commented 4 years ago

Sounds good, Walter. I've also started aggregating the objectives raised today and some narrative, and I'll push those out for your/people's feedback, as well.

walterkozlowski commented 4 years ago

That will be great @markshostak The more thoughts we collectively collect (pun intended) the better...

walterkozlowski commented 4 years ago

A lot of similar information was in my Prague power point pack: CNTT_Networking Fabric 27 Jan.pptx I am revising all of this and will produce a consolidated and coherent view including comments from the team here (@markshostak @oyayan @ASawwaf @TFredberg @petorre )

TFredberg commented 4 years ago

I have added the ETSI NFV reference points and rest of its existing boxes to the exemplification picture I earlier submitted and also added the two deployment models of RA2 that should be possible i.e. on top of an RA1 (in a VM) and Bare Metal directly on the HW layer.

I would suggest we start our definitions of the RM from the ETSI reference points and interfaces where we should honor the interfaces to go in between the layers as depicted. Consequently, we should not draw any linkage from a virtualization function in the (SW) Virtualization Layer directly to a HW resource inside the HW Infrastructure Layer that could be the interpretation when drawing a vertical bounding box around the boxes in two separate layers.

With this layered approach, it should give the positive effect that the RM (deliberately) cannot express if a HW acceleration is implemented on a server blade through a SmartNIC or on a switch box through a Programmable Switch Fabric.

The colors are added to give an indication of what possible different administrative domains, i.e. what parts could be managed by separate organizational groups. Sorry for the plethora of colors that do not come out well in a B/W print, so if anyone have a better way of showing this dimension that would be most welcome.

The ability to have multiple administrative domains could be represented in the RM model as cardinality on the reference points e.g. Vl-Ha could have a cardinality of 1 on the HW layer and n on the Virtualization layer. Whilst the Nf-Vi would have a cardinality of 1 on both ends.

I would hope these things at least partly address the additional good comments from @petorre .

markshostak commented 4 years ago

All, I've created PR #1143 w/ initial content for Baldy. It's intended to reflect the material discussed in the RM mtg, but just a starting point. Feel free to add to, and enhance it.

@petorre & @TFredberg I believe I've incorporated most, if not all of the Objectives and Requirements you came up with. Tomas, may be missing some of yours, as I wrote it before you posted your 2nd set. However, I did include your updated diagram. -M

walterkozlowski commented 4 years ago

All. for now I have stopped working on the content above, as we do not want to duplicate what was done in PR #1143 . The latest content we developed within this Issue: CNTT Networking draft 2020 02 20.docx Comments welcome, as always. My opinion is that this work should be incorporated into the plan as it develops in RM, where PR #1143 has been already merged. The focus at this stage should be, in my view, on a presentation in LA, where we have 60 minutes for networking. I think we should present a long term view and also MVP plan for the next release. M0st importantly, we need to use the time in LA to collect communities views on requirements. Any other thoughts?

@ASawwaf @kedmison: since we are to develop and present the 60 minut Networking Fabric session in LA, I suggest us to get together and discuss how to approach this, keeping in mind that we need to engage a larger community participation in the networking discussion.

ASawwaf commented 4 years ago

@walterkozlowski , For the slides, lets work on it

@markshostak @TFredberg , what you mentioned is great indeed but intail RM doesn't contain any solid networking part, that we try to figure out some baseline to start with, as well what @walterkozlowski start to draft it is just initial content to be part of RM ( to be enhanced ) as the target for networking / SDN/ Fabric is not baldly release as agreed on RM/TSC calls before

markshostak commented 4 years ago

@markshostak @TFredberg , what you mentioned is great indeed but intail RM doesn't contain any solid networking part, that we try to figure out some baseline to start with...

@ASawwaf Hi Ahmed, When you say the RM doesn't contain a solid networking part and that we need a baseline, are you referring to an implementation or can you clarify to what level you're envisioning? The material you reference from Walter contains a mix of specific implementation content and high-level.

walterkozlowski commented 4 years ago

@markshostak I really do not agree that the initial material I started had anything to do with specific implementations (except perhaps some examples which I used for illustrations, clearly saying "for example a a typical deployment" and similar ), but presented major logical entities that play role in next generation SDN, as the open networking projects see it. For example, the DPN may be in a server (example when using smartNIc) or in a switch itself. Important part is a standardised (and potentially programmable) interface between Control and User Plane. I am planing to provide clarifying statements about independence from actual dpeloyment. We can start a bit higher level but not much, because otherwise it will become a bit meaningless. I am happy to joint this material with what Tomas proposed (good starting level I think), and keep it on a logical level at this stage (as it actually is a anyhow).

markshostak commented 4 years ago

@walterkozlowski I merely said it was a mix, but not sure why you focused on that. The intent of the comment was to have Ahmed clarify what he was envisioning.

walterkozlowski commented 4 years ago

Also, if CNTT wants to be relevant for 5G Core (man projects around the world), we need to address next gen SDN in view of the critical CUPS methodology. I do not think many operators will deploy UPF in VMs. @ASawwaf is right that networking is critical part of NFVI. This is very much visible in Edge use cases as well (take for example driverless trains in Australian mining...). So we need to progress this work, and in line with open networking and industry evolution. I hope everybody agrees.

walterkozlowski commented 4 years ago

@markshostak I focused to clarify the intent of the content I wrote.

ASawwaf commented 4 years ago

@walterkozlowski @markshostak what I meant, that we all agreed that interworking part is fluffy in our Rx , and it is crucial to have a solid RM that can help to have complete RAs

and when I mention "networking" there is a lot of aspects that we should cover under "Networking" in terms of Fabric, Overlay, Underlay, SDN, ..

@markshostak what you mention is indeed as outlines and principles and what @walterkozlowski contents is indeed as intail content that can be modified/iterated down the road but at least we need a starting point

Nevertheless; as mentioned there is many networking aspcectes that we work together to define them

TFredberg commented 4 years ago

I would here like to further our networking model discussion and alignment of views before venturing into a descriptive text or too detailed examples in the RM.

I certainly agree with @walterkozlowski and @ASawwaf that networking is fundamental, but have respect for that it is rather complex and hard to get the layering correct since it has almost infinite flexibility but very varying degree of characteristics differences in many aspects beside the simplistic speed vector. I agree with the @walterkozlowski view of a logical control plane separation from the abstracted data plane or forwarding plane. I believe it is quite hard to find useful long-term stable interfaces in between the control and forwarding planes, although the transport mechanisms of these interfaces might be specified.

I find it imperative to show what administrative domains that have access rights to what parts of the control and forwarding pipeline layers to ensure an unbroken chain of trusted and robust separation, encapsulation or other method of virtualization of available resources. These administrative domains are often operated by separate groups with different types of operational goals and could even be divided into different companies’ responsibilities.

I also find it important that the model do not imply or suggest that a separated control plane is necessary centralized in its implementation, even though it might be modeled as a single logical endpoint e.g. through a movable IP address or similar.

I agree with the @walterkozlowski view of that a Telco Function (CNF/VNF) should not be aware of if an NFVI internal supporting function is implemented in an underlay switch fabric or a SmartNIC, and I would actually like to go one step further suggesting that the preference is that a Telco Function should not know about if a function is accelerated or how it is accelerated. This could however be problematics for some types of existing acceleration that today need to have code in the OS or the VNF/CNF to be used as an internal “slow-path” code when acceleration is not available. The dimensioning will also be an area that must be handled e.g. through Support Level Objective requests from each VNF/CNF with Support Level Agreements from the infrastructure levels. But at least a Host kernel or DPDK implemented acceleration should be functionally and semantically equal to a HW implemented acceleration for the VNF/CNF, i.e. method and placement of acceleration shall be transparent for the VNF/CNF. The decision of what of the available acceleration methods to use should preferably be up to an orchestrator on some suitable layer and not up to a potentially greedy VNF/CNF.

To better understand what the CNTT RM networking should prioritize I´d like to bring out a number of aspects that I like to further discuss and get specifically the Operators view on here.

Will most of the networking demanding VNF/CNF (i.e. requiring high bandwidth throughput, multiple physical ports and/or direct network access without any type of NAT in the way) be deployed as standalone appliances on its own SW virtualization layer (IaaS/CaaS) and on separated dedicated HW including its switches (i.e. a HW appliance built on Cloud principles)?
Will the networking demanding VNF/CNF be trusted to place its own data plane (forwarding plane) functions and code inside the shared physical switch fabric in NFVI when there are other VNF/CNF that depends on it?
What will the requirement on a shared switch fabric to safeguard that a specific IaaS or Caas and/or a VNF/CNF could not affect the other non-related infrastructure users, e.g. underlay networking separation on reachability and QoS, separation of control and management domains and access control, ... ?
Who/what layer/what operations group will manage and orchestrate the safeguarding in between the separate network demanding VNF/CNF or separate IaaS/CaaS environments?

My questions come from a viewpoint and suspicion of, that it is likely a quite large can of worm to agree and standardize how a generally accepted interfaces, security and robustness would need to look like inside the physical underlay network switch fabric. I do not dispute the technical merits and what can be done when you are a hyperscaler or Telco Operator that design, implement, test and operate according to its internal rules in their own infrastructure from top to bottom. My point being that a programmable shared switch fabric having several complex areas to address, likely have a rather long way to being industry standardized and accepted, and is in need of some kind of fabric manager that supports multi-tenancy.

My own conclusion, that I like challenged here, is also that it is a faster journey of standardizing how infrastructure and application acceleration can be standardized in SmartNICs since they are naturally confined to the server that run a specific instance of an IaaS/CaaS. They will probably only carry one network demanding group of VNF/CNF at any point in time. It allows a possibility for the underlay network switch fabric to restrict and confine the servers networking when needed (in the same way as many today do when SR-IOV is used) until the point where the SmartNIC itself can be separately managed/orchestrated and trusted to do the underlay networking separation. This makes the blast radius smaller limiting the fault domain of an error (or malicious attack) down to the single VNF/CNF and not the entire infrastructure it sits on.

I would advocate that we in addition to a shared target vision on a very high level also should create a CNTT networking “roadmap” or at least a prioritized backlog list that enables us to focus our efforts on the areas we jointly feel can be described and released within CNTT. This is likely something that require a debate and is hard to do before the Baldy release, but maybe a few items for the list could be inserted initially without priority. I am below making a first stab at a few of them to start the debate and solicit further additions/modifications:

High level RM model o What would suitable steps be under this model?
SmartNIC for SW virtualization acceleration of the forwarding plane
SmartNIC for underlay networking separation of any switch fabric (“Bring your own network”)
Infrastructure Gateway control and management
SmartNIC for VNF/CNF offload/acceleration
Underlay Switch Fabric automated networking separation of any switch fabric (“Bring your own network”
Underlay Programmable Switch Fabric VNF/CNF acceleration

walterkozlowski commented 4 years ago

Great input from @TFredberg. I think that we all are largely in agreement. I personally think that programmability will need to be earlier in our roadmap as it is happening in the industry, as we speak. Btw: as I expressed already in Prague, programmability also covers smartNIC based solutions (and other technologies, some of them unknown yet). We are in agreement with @TFredberg on the necessity of a standardized layer abstracting all "acceleration technologies" from Telco Functions. In my view it will be programmable layer to allow swapping from one technology to another, depending on use case and available infrastructure, which can be reprogram for particular technology. And, agreeing again with @TFredberg, such decisions most likely will be made on the MANO layer. Regarding questions to operators posed by @TFredberg, I think that LA conference will be a great opportunity to accelerate such discussion. In my view, which i am happy expand in such discussion, the answers will depend on use cases. For instance, where I can see a natural way of an introduction of a programmable switch fabric fabric acceleration (using Tomas's term) are SA 5G use cases for enterprise customers. Such use cases can simplify answers to important questions like security and safeguarding.

markshostak commented 4 years ago

=========== NOTICE =========== Discussion of this topic will continue on Etherpad, at: https://etherpad.opnfv.org/p/ZoopMgLK61

Conclusions from Etherpad will be copied back to this issue for historical/tracking purposes.

sukhdevkapur commented 4 years ago

@ASawwaf @walterkozlowski @TFredberg @rabi-abdel @pgoyal01 - Folks, this is a great (and very long discussion). I am glad to see this discussion. We can continue to theoretically keep discussing and expressing our views or take a pragmatic and practical approach and hit this issue head on, get this included and addressed. With that spirit in my mind, here is my proposal:

1) From the short term point of view (i.e. next release after Baldy), we should make a point to include SDN into RM, RA,RI.RC. How do we do this? go to step 2 2) Take an existing, available, and deployed open source versions of SDNs (I can think of two - ODL, and Tungsten Fabric. They both are open source under Linux Foundation). We look at the common functionality they offer (i.e overlay management and underlay management - they both offer both). I am familiar with Tungsten Fabric and can speak for this. We take the API set (and the features offered by it) as the base of the API set in CNTT RM. 3) We work with OpenStack Community to include its full API set as an extension to the neutron API 4) Now the long term point of view - we continue to look into the development of Network fabric and enhance the SDN functionality as appropriate and build on the base which we would have already adopted in step 1-3 above. Most of these SDN deployments will be evolving as is. As an example Tungsten Fabric is already looking into the Network Fabric and other advanced features. By the time CNTT WS will be catching up, SDN controllers would have already caught up - hence, this will be become a smooth sail.

So, the question is - do we want to theoretically keep debating or take what is out there and adopt it and move forward. As an example, we are taking neutron API and using it - we are not debating about it - then why debate about SDN APIs - they are out there in Neutron as well (as plugins).

TFredberg commented 4 years ago

Unfortunately I believe we have to think through and discuss the RM networking model a bit more than to build it based on one or a few of its RA based components. I think it is important to understand the layering both when it comes to CNTT specifications and the Networking in CNTT.

The CNTT specification “layering” have RM to ensure that each RA can specify their SW Virtualization without having to look deep into each of the other RA´s.

In RM we have the requirement that we shall ensure that an operators can make or buy their infrastructure system and their components in a way that enable multiple implementations based on same or different RA. This was not really targeted or even needed when OpenStack was incepted, but it will have larger and larger needs now when operators over a rather long period of time (several years) will have to migrate/transform from VNF to CNF. The cloud native paradigm will unlikely be going to a single SW Virtualization layer (CaaS) as the OpenStack deployments (IaaS) often had making it mandator to have multiple parallel CaaS on the same HW Infrastructure in the cloud paradigm.

The RM Networking model needs to specify how the shared underlay networking could host multiple unrelated and untrusted IaaS and CaaS environment simultaneous. It is therefor not really possible to look into OpenStack or other existing single SW Virtualization layers for the solution to start building on as @sukhdevkapur suggests.

I think we need to agree on the way forward and get ahead of the development of shared infrastructure as a base for the different RA. I start of concepts and layering is being discussed in the newly formed RM Networking Focus Group.

SaadUSheikh commented 4 years ago

@sukhdevkapur do we think that instead of mapping and matching networking with each component of RA is a wise idea ?

My suggestion is to define RM of Networking a separate Model that can fulfill both VIrtualized and containerized . Further it should include other initiatives which are coming in industry faster than expected e.g Dis-aggregated networks , Further we had a good discussion in ETSI NOC and many folks share idea that with SDN function moving in Orchestration framework may be standardization in CNTT no need link with controllers and API's but need more stitch with networking wit hybrid components .This is my view on it .

So my question to @sukhdevkapur will be for Tungsten fabric can we have uniform specs handle for Openstack and Container specs ?

sukhdevkapur commented 4 years ago

@SaadUllahSheikh - Tungsten Fabric handles both OpenStack as well Containers. So, this fits well into the RM as well as both RA1 and RA2. If the team agrees, I will be happy to offer the support to get in into RI as well in the OPNFV Test bed so that it is included in RC as well. @TFredberg - while I made my comment about OpenStack, it really applies to K8s as well.

While we are working on the details on the verbiage on the SDN in the Networking sub-group from the long term point of view, we can have SDN functionality included and working and I will be happy to get this done. This will address both OpenStack as well K8S.

SaadUSheikh commented 4 years ago

@sukhdevkapur agreed on this , I think first you should cascade working here to RI2 as per my understanding till now RC2 is not there to take K8S VNF requirements and for this it will make more sense to refer results from CNCF test bed instead of defining RC from ground up .

ASawwaf commented 4 years ago

@walterkozlowski @TFredberg @kedmison

Shall we close this based content developed under Approach ?

TFredberg commented 4 years ago

I think we can close i based on the substantial additions to approach and the lack of questions on this issue lately.

walterkozlowski commented 4 years ago

@ASawwaf I agree with @TFredberg that we can close this issue now. I still have my contents which I am planing to use later on when we write more details about programmability.

anuket-project / anuket-specifications

[RM] Long Term requirements for SDN, Fabric and Networking #1037