GoogleCloudPlatform / anthos-service-mesh-packages

Packaged configuration for setting up a Kubernetes cluster with Anthos Service Mesh features enabled
https://cloud.google.com/anthos/service-mesh
Apache License 2.0
134 stars 168 forks source link

Migrate Istio-on-gke to Anthos Service doesn't work on 1.6 #1147

Open lucasmpr opened 2 years ago

lucasmpr commented 2 years ago

Hello, just passing by because I had a bad time with the migration script from istio-on-gke to anthos.

I've come to the conclusion that it doesn't work on clusters that have istio on gke version 1.6.

I've found two problems here.

configure_mesh_ca() {
  configure_mesh_ca_14
  configure_mesh_ca_16
}

The configure_mesh_ca_14 doesn't work if it's version 1.6 installed. I got stuck in the 1.4 part and 16 never runs. I had two issues with it: A) I've created a new namespace, after citadel was already down when I created it, istio.default secret was not created. So the script failed B) Even after I've deleted the namespace, I got stuck on Waiting to pick up the new certificate forever. I'm not sure why.

I got to proceed the tutorial after commenting configure_mesh_ca_14. But soon after I got another problem. The tutorial says to "test my application" before continuing, but it didn't update the gateway yet, so i was really confused that nothing was working.

I decided to rollback. But to my surprise it didn't work either. The command described for rollback is the following:

kubectl --context=${CLUSTER_1_CTX} label namespace ${NAMESPACE} istio.io/rev- istio-injection=enabled --overwrite

The problem is istio on gke 1.6 uses the revision istio-1611 and not istio-injection=enabled

I lost so many hours trying to understand why nothing was working. I just had the wrong label on the namespace. The real command to rollback in 1.6 (at least for me) is:

kubectl --context=${CLUSTER_1_CTX} label namespace ${NAMESPACE} istio.io/rev=istio-1611 --overwrite

Trying to warn people.

@richardwxn

richardwxn commented 2 years ago

Thanks for the feedback.

for the issue A), that was expected in your case since this function is for migrating 1.4 ca, you already removed citadel so the new cert can't be distributed by it. I can update the script to add a safe check for the existence of 1.4 control plane before proceeding

for the gateway update, you'll need to proceed to https://cloud.google.com/istio/docs/istio-on-gke/migrate-to-anthos-service-mesh#complete-migration for migrating gateways. we put it in a different section because this step may actually impact your existing legacy control plane, we hope users can verify whether the requests still succeed at this point before proceeding. Old gateway can still work before this step even after you migrate the ca and the proxies.

The rollback part instruction is not clear, we should include both 1.4 and 1.6 as you pointed out, I will update the doc for it.

@lucasmpr