Open guesslin opened 5 years ago
When we run a docker container we specify --privileged
and -v /sys/kernel/mm/hugepages:/sys/kernel/mm/hugepages
among other options. Can you organize huge pages for your application?
Currently NFF-Go relies heavily on DPDK memory management libraries, so even if you don't use DPDK ports, it is still necessary to initialize DPDK.
@guesslin How critical is the dependency on DPDK for your app? Can you use what Gregory suggested? Is this because of usage inside K8s?
@gshimansky about the hugepages we're trying to make it work inside K8s now.
@aregm the major reason for us want to try raw socket is that we expect the raw socket can just run without any DPDK dependency, that way we can run our app in all naive k8s clusters.
At the moment raw socket allows working without network ports bound to DPDK and therefore avoid exclusive system's network card usage by a single container, so a kubernetes pod may receive individual packets NATed to it. DPDK initialization cannot be avoided yet.
@aregm Yes, it's about running nff-go in K8S without DPDK. The problem is that Managing Hugepages was introduced in K8S with version 1.8, but using this version configuring kubelet is required to enable it, which most managed K8S installations AFAIK don't allow (e.g. AKS). This leaves us effectively with requiring K8S version 1.10 which was released around one year ago.
@gshimansky Do you have a K8S YAML with a running hugepages configuration for nff-go?
@marcusschiesser Have you tried the section 6.3.4 of this document https://builders.intel.com/docs/networkbuilders/adv-network-features-in-kubernetes-app-note.pdf ?
@marcusschiesser @guesslin Any update? Were you able to run K8s?
@aregm we have that on hold for now as we're building our own rawsocket based implementation now (without hugepages dependencies) - we'll send you our findings this week
@aregm we tried the the rawsocket in k8s but there's some error, and we can't get packets from it
------------***-------- Initializing DPDK --------***------------
EAL: Detected 2 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
------------***------ Initializing scheduler -----***------------
DEBUG: Scheduler can use cores: [0 1]
------------***---------- Creating ports ---------***------------
------------***------ Starting FlowFunctions -----***------------
DEBUG: Start SCHEDULER at 0 core
DEBUG: Start STOP at scheduler 0 core
DEBUG: Start new instance for OS receiver1
DEBUG: Start new clone for OS receiver1 instance 0 at 1 core
DEBUG: Start new instance for segment1
WARNING: Can't start new clone for segment1 instance 0
I setup the hugepages in the daemonset like
resources:
limits:
cpu: "4"
memory: 200Mi
hugepages-2Mi: 100Mi
requests:
cpu: 200m
memory: 200Mi
hugepages-2Mi: 100Mi
volumeMounts:
It looks like your scheduler is limited to just 2 CPU cores, which are not enough to run all of the flow functions. Did you specify anything in CPUList
field of initialization parameters?
@gshimansky no, I don't specify anything in CPUList
, should I specify with minimum 4 cores?
I try to specify CPUList
in flow.Config
,
log.Println("Initiating nff-go flow system")
flowConfig := &flow.Config{
CPUList: "0,1,2,3", // https://github.com/intel-go/nff-go/blob/v0.8.0/common/common.go#L32
HWTXChecksum: true,
}
flow.SystemInit(flowConfig)
log.Println("Initiated nff-go flow system")
and fix my Daemonset deployment yaml to request 4 CPUs from k8s host
resources:
limits:
cpu: "4"
memory: 200Mi
hugepages-2Mi: 100Mi
requests:
cpu: "4"
memory: 200Mi
hugepages-2Mi: 100Mi
But then all my pods keep Pending
status, the result of describing pod says Warning FailedScheduling 54s (x41 over 25m) default-scheduler 0/2 nodes are available: 1 node(s) didn't match node selector, 2 Insufficient CPU.
We run our application in AKS Cluster with standard VM size Standard DS2 v2
Azure VM size ref which we expected the application should run.
@gshimansky I think the issue here is that AKS, without any configuration, uses VMs that have 2 CPUS, each 2 cores, see:
$ cat /proc/cpuinfo | grep 'cpu cores'
cpu cores : 2
cpu cores : 2
So the expectation would be that the raw socket, being the fallback, works on the AKS standard configuration.
Current implementation of send/receive flow functions via raw socket is not very different from DPDK send/receive implementations. Scheduler treats them in the same way, and allocates entire CPU core for each flow function. So it is necessary to change scheduler so that it would let flow functions share one CPU core. Maybe @ifilippov can add more information.
Has anyone succeeded to initialize NFF-Go app on K8s?
Hi, I'm trying to run nff-go with raw socket devices in kubernetes daemon-set.
I follow the example here, first calling
SystemInit
then usingSetReceiverOS
andSetSenderOS
. But I have some problem with using it inside daemon-set,And check code, the
SystemInit
there's alow.InitDPDK
call that caused these error messages. https://github.com/intel-go/nff-go/blob/048b92a8284baee035a75c57c9f86923d3e71208/flow/flow.go#L621SystemInit
call?SystemInit
, is there other way to init the flow system withoutInitDPDK
?