anuket-project / anuket-specifications

Anuket specifications
https://docs.anuket.io
123 stars 116 forks source link

[RM ch02] profiles language update needed #571

Closed xavier-grall closed 5 years ago

xavier-grall commented 5 years ago

From https://github.com/cntt-n/CNTT/pull/555#issuecomment-550338405: Kelvin and Xavier to update language for profiles.

There are 2 pending issue related to language for profiles, one for compute intensive and another one for basic:

A/ For Compute Intensive profile

Concluded issue: should the CI profile have a low network latency requirement ?

@kedmison what do you think ?


B/ For Basic profile

What are your opinions ?

pgoyal01 commented 5 years ago

@xavier-grall Some minor suggestions and typo corrections"

@xavier-grall refining proposal (#555 (comment)): "VNFCs that perform compute intensive operations (but do not have neither high network throughput nor low network latency requirements)" Suggest that you delete "do not".

B/ For Basic profile @kedmison proposal (#555 (comment)): "VNFCs that perform basic compute operations and can tolerate oversubscription the variable compute latency" and @xavier-grall suggestion: "VNFCs that perform basic compute operations without any specific requirement" Suggestion: "VNFCs that perform compute operations that can tolerate resource over-subscription and variable latency."

ASawwaf commented 5 years ago

@xavier-grall thanks for your effort, i like your summarization for CI , I will go with my definition :)

kedmison commented 5 years ago

@xavier-grall Yes, I think CI should have a low latency requirement. The tables we are generating in chapter 5 indicate that it will have SRIOV. In fact, there is little to no feature differentiation in the networking space between network-optimized and compute-optimized.

We could create some differentiation by specifying that the network-optimized flavours have more tenant bandwidth per VM. Profile descriptions taking this into account might look like this:

  • Basic: VNFCs that can tolerate resource over-subscription and variable latency.
  • Network Intensive: VNFCs that require high network throughput and low network latency.
  • Compute Intensive: VNFCs that require low network latency.

I'd further argue that this definition of Compute intensive is in fact a 'Balanced' configuration, and that IT or ML workloads that we plan to address in future would need a separate 'compute intensive' profile that has faster clockspeeds (i.e. faster single-thread speeds) and/or more cores per VM. So, it may be worthwhile looking at the naming of this particular profile now, to ensure we don't create nomenclature challenges for ourselves when the time comes to address these common IT and ML needs.

xavier-grall commented 5 years ago

@kedmison @pgoyal01 @ASawwaf Thank you for your clear opinions and proposals The essential remaining difference in our opinions is about the network latency for CI. As I said in PR #555 discussion, I am not so sure of my own opinion on that point, so I agree to add the low network latency requirement. For the language, I think the description should be as accurate as possible, especially for future readers, so I would propose to keep the level of compute-related operations:

@kedmison Regarding the solution for addressing CI latency requirement, I would rather think about ovs-dpdk with only a few cores for pmd threads (eg 1 per numa node), since sriov requires the VNF to have implemented a nic-specific driver, which may be very (too much) constraining. But, if necessary, it will have to be discussed in another issue related to another chapter ;-). Regarding IT/ML/AI, I fully approve to consider these workloads in a future release, and also that they may required a new profile (possibly including specific hardware offloading).

kedmison commented 5 years ago

Regarding IT/ML/AI, I fully approve to consider these workloads in a future release, and also that they may required a new profile (possibly including specific hardware offloading).

I was trying to leave some room in the nomenclature to migrate to something like the following:

Basic: VNFCs that can tolerate resource over-subscription and variable latency. Network Intensive: VNFCs that require high network throughput and low network latency. Balanced (our current definition of Compute-intensive): VNFCs that require low network latency. Future: Compute Intensive: VNFCs that require high core counts and high single-threaded performance Future: Storage Intensive: VNFCs that require large amounts of locally attached storage and/or high storage IOPS Future: Graphics-accelerated: VNFCs that require GPU acceleration

Right now, our 'compute intensive' is essentially only different from 'basic' in that it's not overbooked. Otherwise, it's not that special except in the networking terms.

If we adopt this sort of framework above, then we have

It is in thinking about this sort of nomenclature that I have concerns about our current definitions of network-intensive and compute-intensive and calling them both appropriate for compute-intensive operations.

xavier-grall commented 5 years ago

OK I missed your previous point about possible future naming challenge... I think it would be great to be able to keep the current naming, and thus to try to find another name for a possible future ML/AI profile: why not just Enhanced CI (CI+) ? And we could also have Enhanced NI. Regarding GPU, it could be a feature of CI+, as crypto acceleration could be for NI+. Concerning compute intensive operations I think they refer to a requirement for predictable/determinist compute performance (or dedicated compute resource). So, trying to keep high level requirements (and to avoid technical solution options), I propose:

Possible future profiles:

kedmison commented 5 years ago

@xavier-grall I like it; I think that's a very good compromise between keeping the existing naming and creating space for the future workloads, and a good observation about 'predictable' compute.
(The only quibble I have is that I'd disagree on the GPU as part of CI+ as CI is about general-purpose compute resource (CPUs), not GPUs... but that's for future RM discussions and 'issues'.)

xavier-grall commented 5 years ago

OK great Do you think the figures should also be reviewed consequently ? Currently, they are : https://github.com/cntt-n/CNTT/blob/xavier-grall-patch-ch02/doc/ref_model/figures/ch02_infra_profiles.PNG

@pgoyal01 @ASawwaf @ulikleber What do you think about proposed naming and descriptions ?

kedmison commented 5 years ago

Yes, I think so. The bold text in the diagram is not quite aligned with the proposed profile descriptions.

ASawwaf commented 5 years ago

@xavier-grall , I am ok with the workload profile examples, only comment as @kedmison the bold description is not matching Thanks

ulikleber commented 5 years ago

I think it looks good. I don't think we will find many VNFs in Basic in the end. But let's go for it.

xavier-grall commented 5 years ago

@ulikleber Right, we will not find many entire VNFs in Basic in the end, but we should find many VNFCs, especially those related to VNF management planes.

@kedmison @ASawwaf For the figures, I suggest to replace the bold text of NI & CI profile like this:

xavier-grall commented 5 years ago

PR #585 created, including figures update proposal

markshostak commented 5 years ago

@xavier-grall Ready to close this one? If so, I'd like to get it off tomorrow's gov report. :-)

xavier-grall commented 5 years ago

Of course, PR is merged, so issue can be closed (done right now)