cloudfoundry / cf-for-k8s

The open source deployment manifest for Cloud Foundry on Kubernetes
Apache License 2.0
300 stars 115 forks source link

Bump Istio to 1.8 #622

Closed braunsonm closed 3 years ago

braunsonm commented 3 years ago

Is your feature request related to a problem? Please describe. Istio 1.7 is entering EOL at the end of next week. We are eager to use a Istio 1.9 feature which recently went GA. From what I understand Istio should not be updated across multiple minor revisions, so moving to 1.8 in the next cf-for-k8s release and then up again to 1.9 in a future cf-for-k8s release would be appreciated.

Describe the solution you'd like Tested upgrade to Istio 1.8 with a future update to 1.9.

Additional context Istio 1.7 support officially ends on February 19th.

cf-gitbot commented 3 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/176916441

The labels on this github issue will be updated when the story is started.

jamespollard8 commented 3 years ago

Thanks @braunsonm for the callout here. We'll bring this up to the team but we may not be able to complete this for several weeks.

Out of curiosity, which 1.9 feature(s) are you looking forward to?

cc my pair @jspawar

braunsonm commented 3 years ago

Understandable, more so just putting it on the radar.

As for the feature, some of our applications delegate authentication to an OAuth proxy, one for each app. We created a tool which adds C2C internal networking to cf-for-k8s for this purpose. The idea would be to use the new custom AuthorizationPolicy to replace the need for a proxy. Instead every app won't need it's own proxy and all traffic to the cluster (or selectively) will automatically go through the oauth flow if a bearer token is not provided.

Info here: https://istio.io/latest/blog/2021/better-external-authz/

bkrannich commented 3 years ago

@braunsonm: Stumbled over your comment. Out of curiosity: Is it fair to say that you are trying to bring back a subset of CF Route Service use cases to cf-for-k8s? If so, could you share a bit more of how you do that?

braunsonm commented 3 years ago

@bkrannich Yea that's fair to say this would accomplish similar to what route services does. Since Istio can validate JWT tokens already, this extra functionality allows it to also send the user through the OAuth flow if they don't already have a token. We have some custom written code for adjusting some of what the Route Controller in cf-for-k8s does when apps spin up. Right now that means we allow apps to opt into allowing internal communication and/or blocking internet access for specific apps so that unsecured apps can only be reached through a proxy.

Istio 1.9 gives us a lot more flexibility to do this without rolling our own proxy and gives every app a single method for securing traffic before hitting the workload.

jamespollard8 commented 3 years ago

@braunsonm and all,

As a heads up, our expectation is that upgrading to Istio v1.8 will likely break cf-for-k8s platform upgrades. Meaning your current foundation will need to be torn down (kapp delete'd) and then kapp deployed fresh, and all apps would need to be pushed anew. Our understanding is that these upgrade-breaking changes should stop at Istio v1.8

Is that your understanding as well? What expectations would you have from us / the product on top of bumping Istio and calling out the breaking changes and upgrade instructions via release notes?

loewenstein commented 3 years ago

@jamespollard8 would you mind elaborating more on the nature of update breaking changes through Istio 1.8 and how why this is going to stop after 1.8?

Is this related solely about upstream Istio breaking Update 1.7 -> 1.8 or is it the specific usage of Istio in the context of cf-for-k8s?

jamespollard8 commented 3 years ago

Yeah, I should have done a better job qualifying what I wrote. I'm very uncertain about how the upgrade to Istio 1.8 is going to go. My worry about it breaking platform upgrades came mostly from: 1) historical trends (we had upgrade issues for the platform when jumping to istio v1.6 and v1.7, but unfortunately don't have great notes about the specific issues AND the specific people who worked on that have rotated teams/projects) 2) whispers/"water cooler chat" from the networking team several months ago. I remember hearing that major breaking changes were expected in Istio minor versions until 1.8 but some googling around hasn't helped me find anything useful to support that concept.

Tagging @XanderStrike @kauana @rodolfo2488 in case they have any context and/or can share any insight on how they might expect upgrading to istio v1.8 to go.

XanderStrike commented 3 years ago

The strategy for upgrading istio was to identify if anything in the Upgrade Notes might apply to us (I don't see anything, I think we've already disabled Mixer), then test it repeatedly and automate anything special we might've had to do. There's always gonna be a small amount of downtime as requests get routed to ingress gateways that are rolling, but well within the error budget we had.

Our goal was always to avoid at all costs reinstalling your foundation. I haven't looked at the 1.7 -> 1.8 upgrade personally, but you should be able to generate new Istio control plane configuration, kapp deploy it, and then kubectl rollout restart all workloads and system components with a job (see here). Istio has historically had good backwards compatibility with older sidecar versions which allows you to upgrade the control plane first, then the sidecars, without incurring downtime within the cluster.

That's the theory anyway, it's only by testing it repeatedly that you'll discover issues.

Edit: It's worth noting we've upgraded istio across multiple minor versions before and gotten it to work with pretty low downtime. It's not officially tested or supported by Istio, but we were fully testing and automating the upgrade ourselves anyway. It wasn't fun but the idea was to get on the current Istio version and upgrade ASAP so we wouldn't end up in the situation where we're forever running from the EOL dates. As you said though, we've been scattered to the wind so that commitment didn't get kept.

matt-royal commented 3 years ago

Thanks, @XanderStrike. It sounds like the next steps on our side are to try using Istio 1.8+ and see if it works for us out of the box. If not, we'll pair with you to understand how to get past those issues.