cloudfoundry / eirini

Pluggable container orchestration for Cloud Foundry, and a Kubernetes backend
Apache License 2.0
115 stars 30 forks source link

Routing enhancements #72

Closed ndhanushkodi closed 4 years ago

ndhanushkodi commented 5 years ago

We're from the Networking team and are investigating how we might make Routing work better for Eirini. Things such as:

We want to check some assumptions we have and ask for guidance about where we might make enhancements.

Things we think are true:

Is this correct?

Options we're considering:

  1. Enhance Cloud Controller and/or Eirini to put the full Diego Routes data in the annotations
  2. Enhance Cloud Controller and/or Eirini to put Routing info (e.g. CAPI Route and RouteMapping) into K8s API as Custom Resources
  3. ???

Do you have opinions about which of these options we should pursue? Any pointers about where we might start if we wanted to open a pull-request?

cc @rosenhouse

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/168069103

The labels on this github issue will be updated when the story is started.

akshaymankar commented 5 years ago

Hi @ndhanushkodi && @rosenhouse

Thanks for starting to look into this. The things that you think are true are indeed true!

About the options, here are our opinions:

  1. We think CRDs are a "more-native" way of doing things in K8s
  2. We are not sure about who is going to do what in both of the options. As in, does the networking team want to own handling of routing completely? Or Do y'all want to help eirini get into better shape and let eirini team own it? Maybe @julz can also opine here.
  3. If the networking team intends to own the routing completely, it might be better if CAPI directly talked to K8s and created routes and the routing components and eirini had some contract about where/how to look for pods.
  4. If it is the other case, we are not sure what we'd prefer since we have less understanding of the routing requirements, so it might be better to help us understand them better.

Thanks, Akshay && @JulzDiverse

julz commented 5 years ago

We are not sure about who is going to do what in both of the options. As in, does the networking team want to own handling of routing completely? Or Do y'all want to help eirini get into better shape and let eirini team own it? Maybe @julz can also opine here.

My fairly strong personal feeling on this is I'd rather Eirini not end up needing networking context. Having a separate team own networking rather than the Garden team ended up being a super good move, and I'd like to repeat that trick ;)

If the networking team intends to own the routing completely, it might be better if CAPI directly talked to K8s and created routes and the routing components and eirini had some contract about where/how to look for pods.

I think my personal instinct here fwiw is CAPI is already complicated enough since it should really be an API server. We should have Eirini create some standard high-level CRDs and expect networking to action them (e.g. by converging them to Istio-or-whatever). Over time maybe CAPI will create the CRDs rather than Eirini (and Eirini will prove the controller for them), but that seems like an orthogonal refactor for later to me.

akshaymankar commented 5 years ago

@julz I strongly agree that we shouldn't take more context into eirini. But, I see a contradiction in both your comments, if I may paraphrase them:

  1. You'd prefer eirini team to not take on networking context
  2. You'd prefer eirini team to create networking information as CRDs instead of CAPI.

I think the second point brings networking context into eirini. I'd rather have the networking team decide the standard for the CRDs and even create them.

But if CAPI is too complex to also create CRDS, maybe we can split networking component(s) the way we've split native staging, as in, we can call the component(s) with whatever we get from CAPI and the component(s) can create CRDs and do whatever it/they need(s) to make routing work.

Also, let me know if I have misunderstood you and gone on a completely useless train of thought.

julz commented 5 years ago

@akshaymankar tldr I think we can probably own creating a super-high-level declarative top-level CRD (better us than CAPI), I just don't think we should have any context on translating that in to Istio etc.

The shape in my head is something roughly like CAPI -> Eirini-as-k8s-adapter -> Top-level CRD -> [ Eirini-as-controller -> Statefulset | Networking -> Istio etc ]. This would also get us a nice top-level App CRD that would let people directly use that if they'd like (and Eirini is then one mapping, but an operator could do a mapping to e.g. Knative if they liked). Iyswim?

akshaymankar commented 5 years ago

CAPI -> Eirini-as-k8s-adapter -> Top-level CRD -> [ Eirini-as-controller -> Statefulset | Networking -> Istio etc ]

This plan makes sense. @ndhanushkodi @rosenhouse, please let us know what you think.

ndhanushkodi commented 5 years ago

Thanks, that makes sense. Currently, Eirini only receives DesiredLRP info from CAPI. However, for routing/networking we’d need some additional information from CAPI, specifically CF Routes and CF Route Mappings. We think there are two options for how we could get this information that has implications for the architecture, and we’d like to hear your thoughts, @akshaymankar @julz.

Option 1: Eirini is the one and only translation layer from CAPI to K8s API. CAPI begins to tell it about Routes and Route Mappings and it writes those as Custom Resources to K8s API in addition to writing StatefulSets. If we follow this pattern for other system components & features, we may end up with all CC info flowing through Eirini to K8s API.

Option 2: Eirini is responsible only for scheduling, and doesn't acquire additional scope. A separate program (e.g. copilot) learns about Routes and Route Mappings from CAPI and writes Custom Resources to K8s API. If we follow this pattern then different features may end up with different translation layers from CAPI to K8s.

Nitya Gabe Scratch

We think Option 2 would be lower-cost for us (because the CAPI → Copilot path already exists). But perhaps there are bigger-picture architectural reasons to pull all of this into a single component?

rosenhouse commented 5 years ago

cc @zrob

selzoc commented 5 years ago

cc @cwlbraa

cwlbraa commented 5 years ago

The CAPI->Copilot path wasn't particularly expensive to build, nor is it super battle-tested at scale as far as I'm aware.

From a CC perspective my only real concern about option 2 is partial-failure cases (whether that's due to network partition or ill-formed input)-- If we've got 2 separate paths to message compute/routes into k8s, and one or the other fails how do we recover? It's not so simple as a transaction rollback. In battle-tested diego-land, this all happens via one big desiredLRP and we don't have to think about these things.

Outside of the cc, I'm also not sure how much ephemeral state copilot needs to store in order to do the translation. If it's stateless, option 2 seems reasonable. If it's building the route table from "actual" pod ips+ports and then sending that to k8s I'd be more concerned.

All that said, option 2 does seem much easier to build on from an organizational perspective. It may be the smart thing in the short-term to sacrifice ideal behavior under partial failures to gain development-time speedups.

Eventually, I think we'd all like the CC to put its processes and routes directly into k8s as CRs without translation layers. Will having multiple separate translation layers get us there faster?

zrob commented 5 years ago

Hey this is exciting to see!

I think the real value here is the CR itself rather than the path the translation takes (or doesn't via direct CC -> k8s). As a data point I can say I did a spike on having cc create riff and istio crds to expose functions through the CF API living in orgs/spaces and using routes and didn't find the direct k8s convo especially troublesome. I figure y'all can spike a few options and figure it out.

What was troublesome was the routing definition bits. I think having some API that can be CF/Eirini runtime-definition aware would be a great boon. By that i mean i would like to express "route to this process of this app" and it understands the "where from" and the "where to" vs something like "route from this ingress to these pods". The latter requires the requester to know the ingress identity as well as the specific pods, even though those pods are generated as an implementation detail of things like processes and sidecars of apps.

I also think it would be awesome to have routing be able to own as much of that definition/impl as possible as they'll be closest to the use-cases.

Another point of interest is that the v3 CC API does not have a route mapping construct. They were flattened onto route objects as a list of destinations (although as a sort of sub resource I find a bit weird, but neither here nor there) so you may want to consider that first as we look at building out the CR.

rosenhouse commented 5 years ago

Thanks for the feedback all. @ndhanushkodi and I are drafting a proposal with example YAMLs for custom resources, much along the lines of what @zrob describes (app and route centric, and using the v3 Route w/ Destinations model). We're also trying to account for routing to system components and other VMs that express routing desires via Route Registrar. We'll share a link to it in the next few days.

julz commented 5 years ago

FWIW I think having CAPI write CRDs with eirini as essentially just the top-level convergence loop/controller to {kube/knative/riff/istio} etc has always been at the back of my head the goal state.

I think it's nicer for CAPI to write one top-level thing that we can map to all the other stuff, since it's the most flexible arrangement, and keeps the maximum amount of k8s knowledge out of capi itself vs CAPI creating a network CRD, app CRD etc. It's also the most k8s-native way if most of the convergence logic lives in a controller inside k8s.

My assumption has been that short term that'd happen via CAPI -> eirini-as-adapter -> CRD -> eirini-as-controller and then simplify longer-term to CAPI -> CRD -> eirini-as-controller, but moving more aggressively to that end state sounds great if it doesn't push the schedule of things out as much as my guess was that it might?

This is all super cool, either way :).

ndhanushkodi commented 5 years ago

Hey everyone, here's a design doc with some rough ideas. We'd love to hear your feedback.

rosenhouse commented 5 years ago

We'd love any feedback on this doc ^^^. We're ready to execute on it...

rosenhouse commented 5 years ago

FYI, our team now has a public backlog.

ndhanushkodi commented 5 years ago

We've also got a repo with instructions to deploy networking on an Eirini environment here

julz commented 4 years ago

Closing this since there's now a public backlog (https://github.com/cloudfoundry-incubator/eirini/issues/72#issuecomment-547603073) and instructions for deploying (https://github.com/cloudfoundry-incubator/eirini/issues/72#issuecomment-548456756) the enhanced routing!