[RM/RA/RI/RC] Document Hardware Objectives, Guidelines and Approach

Issue: As discussed in multiple forums, CNTT currently specifies hardware in multiple documents (e.g., RM, RA and RI), making alignment challenging. Further, when RI/CIRV (or CSPs) are procuring hardware, there needs to be an objective mechanism to ensure the candidate configuration(s) will support the CNTT designs.

Due to business realities (and focus on VNF abstraction), CNTT cannot specify exact h/w requirements (e.g. SKU)
Due to rate of change in technology, even if CNTT did specify SKUs, the spec would rapidly become stale and maintenance would be impractical
Therefore, as discussed on RM call, we propose creating a set of Guidelines that
- i. Meet the documented CNTT objectives for specifying h/w
- ii. Are decoupled from the technology de jour and therefore do not require constant updating
- iii. Can assist users in identifying h/w that will support CNTT infra designs
- iv. Provide validation (i.e. objectively assess if proposed SKUs are aligned w/ CNTT platform needs)

Objectives/Drivers: Drivers are the key to this process, and will be considered carefully. Please add your proposed objectives.

Deterministic and predictable performance
YOUR PROPOSED DRIVER GOES HERE
Reduced operational complexity of infrastructure [Trevor]

Approach: The following has been discussed independently in various forums. The intent here, is to bridge all of the forums to achieve a CNTT-level consensus. This is just a starting point to drive the discussion. It is expected to be modified/enhanced (i.e. this is not edict, it's catalyst).

As previously discussed, the current thinking is to:

Identify CNTT's objectives/drivers for specifying h/w in the first place (i.e. what are we trying to achieve by detailing the h/w)
Derive a set of clear, concise and usable Guidelines from the Objectives
Identify where the Guidelines should live and PR them to those locations
Gain consensus on a series of related questions (see below)

Related Questions: These questions have come up in RI/CIRV activities. They may be fully CSP discretionary, a hard CNTT requirement or somewhere in between. Irrespective, a supporting the reason, justification or rationale should be identified for each conclusion. In no particular order:
A. What/can node types/functions can be combined in production (and why)

Compute?
Storage?
SDN Controller?

B. How to best accommodate hetero and homogeneous server farms C. Should CNTT recommend interface speed guidelines? (if so, what are they?) D. Should CNTT recommend server farm dimensioning guidelines? (if so, what are they?) E. RC test bed requirements

Can/which nodes be combined by RI/CIRV in support of RC? (and for which type of testing)
What scale is required to ensure adequate functional testing?
What scale is required to ensure adequate scalability testing? (does not imply we will test scale)
What bandwidth is required for physical test bed interfaces?

F. Insert additional questions here

This is a large Issue, and it is expected to spawn actionable supporting issues that can be then be assigned/adopted by individuals.

@markshostak My suggestion is the following:

Document the answers in the RM Chapter 1.
- between 1.4 and 1.5 (we can call it: Approach)
- one issue/PR is needed for this.
- Let us create the PR and start the review process.

Quick Answers to some questions from me (to be discussed and agreed on while reviewing the PR)

RM should contain minimum capabilities.
- CPU speed (as per Chapter 5) , aggregated throughput for VM (as part of the flavour per Ch04), and any vendor agnostic capabilities.
- Vendor specific capabilities needs to be abstracted if to be presented.
- Answer the questions: what would this vendor specific capability allow me to do/Add? and use the answer to present this capability in agnostic way.
RA should contain more specific details.
- on the platform level: (go as specific as it needs to)
- number of sockets, number of NIC cards for each node type, sizing of each node, speed of ports, etc.
RI/Labs will specify exact model, and specification.
- expected to be 100% match to RA.
- Vendor hardware/implementation expected to also match the spec.
- if vendor decide to exceed or fall short of the spec and still manage to get the same level of determinism in terms of capabilities and performance, etc (passes all certification expected), it is up to the operator to assist the risk of deploying such solutions and the impact this might have on forward compatibility and cost.

I hope that gives initial direction. let us discuss more in TSC and more importantly when we create the PR.

UNH has proposed the following configuration for the UNH IOL LaaS lab for 2019 procurement. This is an agenda item for the RM call, as CNTT needs to be aligned in time for UNH to take delivery of this h/w in 2019. Each server shall meet the following minimum specifications:

CPU:

2x CPU sockets, providing 24 cores each, at 2.2 GHz

Memory:

512 GB RAM

Storage:

3.2 TB SSD via SATA 6 Gbps
Storage should present as at least 3 or more disks to the OS, allowing for usage as CEPH storage nodes, or similar.

Network Interfaces (note 1)

4x 10Gbps Ethernet Ports
Out-of-band Management Port

Note 1: At least 1 network interface must be capable of performing PXE boot and that network must be available to both the Jump / Test Host and each bare-metal server.

In regards to the RM part of the conversation on 590 I have a couple of comments to former posts:

@rabi-abdel

RM should only specify the abstracted models and be technology agnostic
- that RA and RI shall include the further detailing of
- that VNFs should be able to state their minimum requirements (Objectives) towards any minimum requirements on actual HW and then it should be left for the RI level or possibly addendums of the RA models to select and dimension the actual HW

Maybe these comments do not belong in #590, but they are at least examples of what of #590 that I would consider important to have in the RM (in the abstracted way):

RM HW model (objects, attributes) missing
- ISA to denote if there are specific ISA dependencies
  - Optionally a sub specification of ISA generation i.e. denoting the new instructions that come with newer generations
- Speed in some form of abstracted way
  - e.g. expressed as equivalence to suitable set of benchmarks (like SpecINT)
  - Preferably not GHz that is very ISA, generation and implementation specific
- CPU attached (Look-aside) accelerators e.g. Crypto (like Intel QuickAssist), GPU and FPGA
- With a set of functions and benchmarking type of characteristics
- CPU attached Non Volatile Memories in parallel to RAM
- In-line attached accelerators e.g. SmartNIC, Crypto, HostBasedAdapters, …
- With a set of capabilities, functions and benchmarking type of characteristics
  - e.g. the already expressed Crypto and Compression
  - but it could also be programmable devices such as NPU, FPGA, DSP, GPU
  - and the networking pipeline could be programmable e.g. with P4 or similar
- NIC functionalities, capabilities and configurabilities
- e.g. support for PF, VF, SR-IoV and PCI-Pass-Through
- For SmartNICs this will be a larger list
- NUMA aspects of CPU socket connected accelerators and NICs i.e. to what socket or sockets they are connected
Functions should not be prescriptively placed in a specific category (Compute, Storage, Networking) of components in the RM model if there is not a well-supported/accepted singular placement of that function
- One example would be where network encapsulation (VTEPs) shall be done which could be done in the CPU e.g. by OVS data plane or in a SmartNIC or in the switch fabric (Leaf-Spine complex)
- Another example is the usage of Storage servers in contrast to generalized servers and JBODs/JBOFs

As a comment to the RM meeting today (20 Nov) about support and focus of Heterogenous computer systems, I think that Heterogenous systems must be supported in a proper way both from functionality and dimensioning point of view to allow gradual and sliding HW upgrade schemes.

@TFredberg wrote:

Speed in some form of abstracted way

e.g. expressed as equivalence to suitable set of benchmarks (like SpecINT)

Preferably not GHz that is very ISA, generation and implementation specific

The challenge with this is that benchmarks like specINT and specFP are generally collections of tests that are executed, each with their own result, and that a single number is often being used to try to simplify the view of these different performance results. This chart, https://images.anandtech.com/doci/15009/Tremont%20-%20Stephen%20Robinson%20-%20Linley%20-%20Final-page-012.jpg, showing relative performance of a new Tremont architecture relative to Goldmont Plus. While these Atom-class chips may not be completely relevant, the process of comparing them is, and this chart shows a line drawn at about 32% uplift from Goldmont Plus to Tremont... but several individual tests falling well below or significantly exceeding that 32% number. Depending on which of the individual tests a VNF workload most looks like, it may experience a performance boost that differs widely from the 32% average uplift.

This is relevant to CNTT because an attempt to use a generic measure like SPEC or other performance number may run into the same situation; being wildly inaccurate for some types of workloads.

in essence, this is my rationale for wanting to add

CPU Microarchitecture (Haswell/Broadwell, or Cascade Lake, or Naples, or ARM v8, etc)
minimum CPU Speed in GHz

to the compute portions of the profile specification.

@kedmison I agree to that SpecINT and other benchmarks are not 100% accurate, and welcome if someone knows about more representative abstracted ways for a set of applications (VNFs).

The core speed is not very accurate either specifically if the core architecture (generation) is not given. It is counterproductive for cloud deployment where HW/SW decoupling is important, to have to match an application into a specific core architecture and frequency. This would require applications to test with all possible core architectures and possibly also at different core frequencies and deliver some performance graphs from that. Still there are other potentially disturbing elements in cloud deployments where the socket is shared e.g. by noisy neighbors that might disturb the caches, IO and host scheduling.

I also agree that 1 value is not giving enough information, but that is just the same problem with giving the frequency of the cores. Specifically now with the modern sockets having features like Intel Speed Select (from Cascade Lake) and the rather old Turbo functionality. Speed Select enables the user (that is in control of these registers) to select how many cores that should be used and what base frequency each individual core should be enabled with. These are all great features but with a single speed figure (in whatever format) no orchestrator can make use it.

I guess we agree on that 1 value is not enough.

Are there someone that have some good suggestions of some sets of representative benchmarks valid for our industry?

Not only one value but one workload or method may not be enough. We might consider various "performance profiles" using different benchmarks. e.g. a candidate for network intensive is ETSI GS NFV-TST 009 - "Specification of Networking Benchmarks and Measurement Methods for NFVI" https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.02.01_60/gs_nfv-tst009v030201p.pdf We are working on a PoC to explore this approach in the Intel OPNFV Community Lab ... i.e. deploying an example reference implementation on a variety of Xeon-SP and Xeon-D platforms and measuring performance using open source tools deployed in VMs. While starting with network intensive we would also like to explore compute and storage next.

@trevgc This form of benchmark tests seems to be on the right path for compute and communication HW services, but they are in these specifications expressed as complete NVFi benchmarks with the VM or Container hosts included in the DUT/SUT.

What we are after here would more be that the set of different "benchmark application" also include the behavior and needs of the SW virtualization layer and only measure the goodness of the pure HW infrastructure, but in the same types of characteristics i.e. Throughput, Latency, Delay Variation and Loss maybe with some additional characteristics added.

The only other characteristic that I can envisage today, in regards to compute and communication, that I can not see modelled in the above characteristics would possibly be the benchmark applications internal "built up knowledge" that enable it to perform its task with some improved characteristics or quality measure. This would likely grow in importance with increased level of included ML/AI functionality.

We would also have to add storage access oriented benchmarks characteristics e.g. IOPS@R/W(ratio), Read-Latency, Write-Latency, etc. on some application level like Blocks, Files and Objects possibly also amended with characteristics for systems like Resiliently Stored where that is required.

For all the above usages we should probably over time add the characteristics Energy Consumption per Throughput (and built up knowledge + storage).

With a number of well selected "benchmark applications", likely dependent on the RA in which they are deployed, and a number of different "loading points" or "traffic mixes" I would think it would be possible to also cover included HW accelerators, since after all they are only there to accelerate some of these characteristics.

Then I realize we are not there today and these benchmarks might not exist in the wanted form, but we need to start breaking up the prescriptive SW definition of the HW as being a very detailed prescription of what components it have to have and how they should be interconnected. Without this we will never get an efficient enough HW with relevant acceleration.

I also fully agree with @kedmison that a single figure comparison will leave the floor open for a simplistic evaluations of HW purchase price without proper considerations into other values that also needs to be evaluated.

anuket-project / anuket-specifications

[RM/RA/RI/RC] Document Hardware Objectives, Guidelines and Approach #590