ciao-project / ciao

Ciao - Cloud Integrated Advanced Orchestrator
Apache License 2.0
215 stars 51 forks source link

Ensure that restarted CNCI instances are rescheduled to an NN in the same network segment. #1044

Open markdryan opened 7 years ago

markdryan commented 7 years ago

The scheduler need to ensure that the CNCI is rescheduled to an NN in the same network segment. This can be done by checking the Compute Net IP of the Physical Node. It should choose a NN which has an IP that is on the same Compute Subnet

markdryan commented 7 years ago

@tpepper Is this something that the scheduler already does? If not, would it be much work to do?

tpepper commented 7 years ago

Currently the scheduler does not track this. If it needs tracked, it wont be hard. I would see a network node's network segment as yet another trackable resource that its launcher would report and we can do scheduling based on it if a workload start request includes a request.

Do we support multiple network segments today?

markdryan commented 7 years ago

Do we support multiple network segments today?

Not sure. @mcastelino Any idea?

mcastelino commented 7 years ago

@markdryan @tpepper Yes we do support multiple network segments. That is the reason why the configuration of Ciao support multiple "Compute Networks". The networking layer will scan the machine and attach to the first one it sees.

So we do not support a single machine connected to multiple active network segments that serve the same function (management/compute). We just pick the first one. https://github.com/01org/ciao/blob/master/networking/libsnnet/cn.go#L190

Here cn.ComputeAddr & cn.ComputeLink will provide the details of the segment you attached to.

On the same lines we have https://github.com/01org/ciao/blob/master/networking/libsnnet/cnci.go#L96

In case of the CNCI it will always be on the same Network Compute segment as the NN (due to our use of macvtap).

mcastelino commented 7 years ago

@markdryan @tpepper To elaborate a little more on what I mean by "we do not support multiple active segments". When we create tunnels we pick the IP of the first Compute Segment we see as the Tunnel Src IP. https://github.com/01org/ciao/blob/master/networking/libsnnet/cn.go#L747 Hence if the machine has multiple active segments, unless both sides of the tunnel are re-setup the migration will not succeed.

tpepper commented 7 years ago

@mcastelino I think from Mark's initial description in the issue "Compute Net" refers to the cluster configured "compute_net". That and "mgmt_net" are configurables under "launcher", as per the example in https://github.com/01org/ciao/blob/master/configuration/README.md. I believe today there is only one compute_net for the cluster and each compute and network node is required to be on it.

mcastelino commented 7 years ago

@tpepper As specified in the spec compute_net: list [The launcher compute network(s)] is a list of compute networks.

The compute network the CN or NN is on (of the list of possible valid networks as specified in the configuration) is reported back to the launcher. I do not know if that information is carried back to the scheduler and I assume can be sent to the controller as part of the stats message https://github.com/01org/ciao/blob/master/payloads/stats.go#L100

tpepper commented 7 years ago

I never knew that was a list. As-is today scheduler / SSNTP-server doesn't care as any of the frames dealing with this field are simply passed through. I will extend the scheduler to record and track it and enable cnci placement to a correct node if a net is requested in the start frame.