Open Scorpiion opened 3 years ago
Hey @Scorpiion , creating a new GKE cluster on the regular channel will today install version 1.23.0 of Config Connector. Add-on updates are triggered by every master node upgrade, which on the regular channel should have certainly happened between when it looks like the cluster was installed and now.
Could you share the gcloud container clusters describe
output for that particular cluster? And in addition, could you share the output of the following call:
kubectl get po -n configconnector-operator-system configconnector-operator-0 -o jsonpath='{.metadata.annotations.cnrm\.cloud\.google\.com/operator-version}'
We do unfortunately have a time lag between our standalone release and our add-on release on the order of ~3 weeks, which we are looking to reduce in the future. However, the lag you're describing is way too long, so this must be some sort of issue either with your cluster configuration or internally on the Google API.
You described two problems in this thread, though; the title is about the lag being too long, but the VM problem seems separate. Could you confirm on a separate cluster on 1.27.0 if the issue still remains? Or, let's wait until we understand your add-on upgrade issue, and then see if it is still an issue at that point in time.
To your questions #3 and #4:
And sorry for the friction. We really appreciate your investment in Config Connector and want to ensure we are a good fit for your (and your clients') use cases.
Hi @kibbles-n-bytes, thanks for the quick reply. Maybe I should add also that I have two clusters (dev/prod) that both are in the same state with old config connector version. Both of these clusters had a manual install of config connector earlier (I had an earlier issue related to this that can be seem here: #287).
Here is the output from gcloud container clusters describe
:
addonsConfig:
configConnectorConfig:
enabled: true
httpLoadBalancing: {}
kubernetesDashboard:
disabled: true
networkPolicyConfig: {}
autoscaling: {}
binaryAuthorization: {}
clusterIpv4Cidr: 172.24.0.0/18
createTime: '2020-04-16T12:56:39+00:00'
currentMasterVersion: 1.17.9-gke.1504
currentNodeCount: 6
currentNodeVersion: 1.17.9-gke.1504
databaseEncryption:
state: DECRYPTED
defaultMaxPodsConstraint:
maxPodsPerNode: '110'
endpoint: 35.228.133.114
initialClusterVersion: 1.14.10-gke.27
initialNodeCount: 1
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-b/instanceGroupManagers/gke-PROJECT_NAME-default-pool-33323955-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-c/instanceGroupManagers/gke-PROJECT_NAME-default-pool-1d45531e-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-a/instanceGroupManagers/gke-PROJECT_NAME-default-pool-30bad168-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-b/instanceGroupManagers/gke-PROJECT_NAME--shared-gvisor-no-22d9f3e7-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-c/instanceGroupManagers/gke-PROJECT_NAME--shared-gvisor-no-e9ccd380-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-a/instanceGroupManagers/gke-PROJECT_NAME--shared-gvisor-no-826aeaf0-grp
ipAllocationPolicy:
clusterIpv4Cidr: 172.24.0.0/18
clusterIpv4CidrBlock: 172.24.0.0/18
clusterSecondaryRangeName: vnet-172-24-0-0-18-pod-range
servicesIpv4Cidr: 172.24.192.0/20
servicesIpv4CidrBlock: 172.24.192.0/20
servicesSecondaryRangeName: vnet-172-24-192-0-20-service-range
useIpAliases: true
labelFingerprint: 9cc782fd
legacyAbac: {}
location: europe-north1
locations:
- europe-north1-b
- europe-north1-c
- europe-north1-a
loggingService: logging.googleapis.com/kubernetes
maintenancePolicy:
resourceVersion: e3b0c442
masterAuth:
clusterCaCertificate: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
masterAuthorizedNetworksConfig:
cidrBlocks:
- cidrBlock: xxxxxxxxxx
displayName: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- cidrBlock: xxxxxxxxxx
displayName: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- cidrBlock: xxxxxxxxxx
displayName: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- cidrBlock: xxxxxxxxxx
displayName: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
enabled: true
monitoringService: monitoring.googleapis.com/kubernetes
name: CLUSTER_NAME
network: NETWORK_NAME
networkConfig:
defaultSnatStatus: {}
enableIntraNodeVisibility: true
network: projects/NETWORK_NAME/global/networks/NETWORK_NAME
subnetwork: projects/NETWORK_NAME/regions/europe-north1/subnetworks/vnet-172-24-240-0-26-kubenet
networkPolicy:
enabled: true
nodeConfig:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-standard-2
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/trace.append
serviceAccount: default
shieldedInstanceConfig:
enableIntegrityMonitoring: true
workloadMetadataConfig:
mode: GKE_METADATA
nodePools:
- config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: n1-standard-2
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/trace.append
serviceAccount: default
shieldedInstanceConfig:
enableIntegrityMonitoring: true
workloadMetadataConfig:
mode: GKE_METADATA
initialNodeCount: 1
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-b/instanceGroupManagers/gke-PROJECT_NAME-default-pool-33323955-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-c/instanceGroupManagers/gke-PROJECT_NAME-default-pool-1d45531e-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-a/instanceGroupManagers/gke-PROJECT_NAME-default-pool-30bad168-grp
locations:
- europe-north1-b
- europe-north1-c
- europe-north1-a
management:
autoRepair: true
autoUpgrade: true
maxPodsConstraint:
maxPodsPerNode: '110'
name: default-pool
podIpv4CidrSize: 24
selfLink: https://container.googleapis.com/v1/projects/PROJECT_ID/locations/europe-north1/clusters/CLUSTER_NAME/nodePools/default-pool
status: RUNNING
upgradeSettings:
maxSurge: 1
version: 1.17.9-gke.1504
- config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS_CONTAINERD
labels:
sandbox.gke.io/runtime: gvisor
machineType: e2-standard-4
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/trace.append
sandboxConfig:
type: GVISOR
serviceAccount: default
shieldedInstanceConfig:
enableIntegrityMonitoring: true
taints:
- effect: NO_SCHEDULE
key: sandbox.gke.io/runtime
value: gvisor
workloadMetadataConfig:
mode: GKE_METADATA
initialNodeCount: 1
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-b/instanceGroupManagers/gke-PROJECT_NAME--shared-gvisor-no-22d9f3e7-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-c/instanceGroupManagers/gke-PROJECT_NAME--shared-gvisor-no-e9ccd380-grp
- https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/europe-north1-a/instanceGroupManagers/gke-PROJECT_NAME--shared-gvisor-no-826aeaf0-grp
locations:
- europe-north1-b
- europe-north1-c
- europe-north1-a
management:
autoRepair: true
autoUpgrade: true
maxPodsConstraint:
maxPodsPerNode: '110'
name: shared-gvisor-node-pool-1
podIpv4CidrSize: 24
selfLink: https://container.googleapis.com/v1/projects/PROJECT_ID/locations/europe-north1/clusters/CLUSTER_NAME/nodePools/shared-gvisor-node-pool-1
status: RUNNING
upgradeSettings:
maxSurge: 1
version: 1.17.9-gke.1504
privateClusterConfig:
enablePrivateNodes: true
masterIpv4CidrBlock: 172.24.240.192/28
peeringName: gke-nda9853dcb96357d4233-8a67-dd27-peer
privateEndpoint: 172.24.240.194
publicEndpoint: 35.228.133.114
releaseChannel:
channel: REGULAR
resourceLabels:
cnrm-lease-expiration: '1603438961'
cnrm-lease-holder-id: btl7v7gqo9cmjt3dh2s0
managed-by-cnrm: 'true'
selfLink: https://container.googleapis.com/v1/projects/PROJECT_ID/locations/europe-north1/clusters/CLUSTER_NAME
servicesIpv4Cidr: 172.24.192.0/20
shieldedNodes: {}
status: RUNNING
subnetwork: vnet-172-24-240-0-26-kubenet
workloadIdentityConfig:
workloadPool: PROJECT_ID.svc.id.goog
zone: europe-north1
The output of that command was also mentioned in my initial post, but I'll restate it here:
kubectl get po -n configconnector-operator-system configconnector-operator-0 -o jsonpath='{.metadata.annotations.cnrm\.cloud\.google\.com/operator-version}'
1.15.1
Regarding triggering of updates, are they triggered both by manual upgrades (clicking the upgrade button in the UI) and by automated upgrades? I have upgraded a bit manually because of announced kubernetes vulnerabilities (this specifically: https://cloud.google.com/kubernetes-engine/docs/security-bulletins#gcp-2020-012)
Hi again @kibbles-n-bytes, I started to reproduce this on a brand new cluster, and I still get 1.15.1
as the installed config connector version.
Steps to reproduce:
Regular channel
and enable workload identity and config connectorkubectl get po -n configconnector-operator-system configconnector-operator-0 -o jsonpath='{.metadata.annotations.cnrm\.cloud\.google\.com/operator-version}'
I still get 1.15.1
when doing this. So maybe this is the core problem here then... I'll wait trying to recreate the VM config when we have solved this config connector version issue.
Here is also full output gcloud container clusters describe
from this new test cluster:
gcloud container clusters describe --project=config-connector-debug-1 --zone=europe-north1-a cluster-1
addonsConfig:
configConnectorConfig:
enabled: true
dnsCacheConfig: {}
horizontalPodAutoscaling: {}
httpLoadBalancing: {}
kubernetesDashboard:
disabled: true
networkPolicyConfig:
disabled: true
authenticatorGroupsConfig: {}
autoscaling: {}
clusterIpv4Cidr: 10.0.0.0/14
createTime: '2020-10-23T07:38:20+00:00'
currentMasterVersion: 1.17.9-gke.1504
currentNodeCount: 3
currentNodeVersion: 1.17.9-gke.1504
databaseEncryption:
state: DECRYPTED
defaultMaxPodsConstraint:
maxPodsPerNode: '110'
endpoint: 35.228.1.253
initialClusterVersion: 1.17.9-gke.1504
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/config-connector-debug-1/zones/europe-north1-a/instanceGroupManagers/gke-cluster-1-default-pool-15c97236-grp
ipAllocationPolicy:
clusterIpv4Cidr: 10.0.0.0/14
clusterIpv4CidrBlock: 10.0.0.0/14
clusterSecondaryRangeName: gke-cluster-1-pods-2f860077
servicesIpv4Cidr: 10.4.0.0/20
servicesIpv4CidrBlock: 10.4.0.0/20
servicesSecondaryRangeName: gke-cluster-1-services-2f860077
useIpAliases: true
labelFingerprint: a9dc16a7
legacyAbac: {}
location: europe-north1-a
locations:
- europe-north1-a
loggingService: logging.googleapis.com/kubernetes
maintenancePolicy:
resourceVersion: e3b0c442
masterAuth:
clusterCaCertificate: XXXXXXXXXXXXXXXXXXXXXXXXXXX
masterAuthorizedNetworksConfig: {}
monitoringService: monitoring.googleapis.com/kubernetes
name: cluster-1
network: default
networkConfig:
network: projects/config-connector-debug-1/global/networks/default
subnetwork: projects/config-connector-debug-1/regions/europe-north1/subnetworks/default
networkPolicy: {}
nodeConfig:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: e2-medium
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/trace.append
serviceAccount: default
shieldedInstanceConfig:
enableIntegrityMonitoring: true
workloadMetadataConfig:
mode: GKE_METADATA
nodePools:
- autoscaling: {}
config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS
machineType: e2-medium
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/trace.append
serviceAccount: default
shieldedInstanceConfig:
enableIntegrityMonitoring: true
workloadMetadataConfig:
mode: GKE_METADATA
initialNodeCount: 3
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/config-connector-debug-1/zones/europe-north1-a/instanceGroupManagers/gke-cluster-1-default-pool-15c97236-grp
locations:
- europe-north1-a
management:
autoRepair: true
autoUpgrade: true
maxPodsConstraint:
maxPodsPerNode: '110'
name: default-pool
podIpv4CidrSize: 24
selfLink: https://container.googleapis.com/v1/projects/config-connector-debug-1/zones/europe-north1-a/clusters/cluster-1/nodePools/default-pool
status: RUNNING
upgradeSettings:
maxSurge: 1
version: 1.17.9-gke.1504
releaseChannel:
channel: REGULAR
selfLink: https://container.googleapis.com/v1/projects/config-connector-debug-1/zones/europe-north1-a/clusters/cluster-1
servicesIpv4Cidr: 10.4.0.0/20
shieldedNodes: {}
status: RUNNING
subnetwork: default
workloadIdentityConfig:
workloadPool: config-connector-debug-1.svc.id.goog
zone: europe-north1-a
Hey @Scorpiion , I checked our GKE<->Config Connector version association and the version of k8s you are current at, 1.17.9-gke.1504
, is in fact intended to be on Config Connector 1.15.0. However, this is because this GKE master version is quite out of date at this point. I attempted to emulate your environment but am only able to get 1.17.12-gke.1504
as the default version for the regular release channel, which is on Config Connector 1.23.0 (the intended regular channel version). I checked our telemetry and can confirm your seems to be the only cluster that through the add-on has Config Connector at 1.15.0, and 1.19.0 is the next lowest number.
This doesn't seem to be a Config Connector issue; this is a generic question for the default GKE master version on the regular channel in your environment. As a mitigation, can you attempt to manually trigger a master upgrade to 1.17.12-gke.1504
? And then follow up with GKE support as to why your master version is so outdated?
Hi @kibbles-n-bytes and sorry for the slow reply, I was sick last week but are back to work now.
After your comment here a new GKE version became available for the Regular channel for me. In my post above I created two brand new clusters and they both got the old GKE master versions. Now I have newer GKE master and hence also newer config connector addon, so that is solved. I now have 1.23.0.
Now on to my issues, they did not go away. I managed however to edit the yaml files so I now have no errors as a workaround. I still think these are bugs worth fixing though. I repeat the core error message here:
Update call failed: the desired mutation for the following field(s) is invalid: [networkInterface.0.NetworkIp bootDisk.0.InitializeParams.0.Image]
networkInterface.0.NetworkIp
When I created the VM I used an external reference to an internal ip that I had created/reserved. That worked and it got the correct ip, but then it stops working (external reference don't work after creation). If I however replaced it with the hardcoded ip later the error goes away.
So this worked on creation (leaving out other fields):
networkInterface:
networkIp: https://www.googleapis.com/compute/v1/projects/xxxxxxx/regions/europe-north1/addresses/xxxxx-internal-ip
But after VM creation it stops working and gives the error above, when replacing it like this the error goes away:
networkInterface:
networkIp: 172.22.0.2
I think this is bug with external reference for networkIp
, it does not resolve the external value into an ip somehow, seems like it compared the external url with the actual ip or something along those lines maybe?
bootDisk.0.InitializeParams.0.Image
When I created the VM I used the lts version of Google's Container-Optimized OS. I refereed to it like this:
(leaving out other fields):
bootDisk:
initializeParams:
sourceImageRef:
external: cos-cloud/cos-81-lts
It worked, it created the VM with correct image, however on updates it fails and complains with the error above.
If I after VM creation replace the value with cos-cloud/cos-81-12871-1196-0
then the error goes away (I got that value from looking in the GCP console). I don't remember if I deleted the config connector entry in between or just updated the value, I might have deleted the config connector object in between (with cnrm.cloud.google.com/deletion-policy: abandon
so the VM stayed).
The error goes away if I replace cos-cloud/cos-81-lts
with cos-cloud/cos-81-12871-1196-0
bootDisk:
initializeParams:
sourceImageRef:
external: cos-cloud/cos-81-12871-1196-0
I'm thinking this is also a bug to fix, it should either not work at all with external: cos-cloud/cos-81-lts
or it should work fully I think. Or what do you guys think? =)
Update, VM is now in restart loop...
The networkIp workaround did not actually work. It's works for one cycle then config connector thinks something has changed and restarts the whole VM, and it continues doing that forever basically. Luckily I tried this on a development server first. I get this from the activity logs in the GCP console:
This pattern repeats about every 10 minutes continuously
Completed: Start VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com started VM app-name-mysql
Start VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com started VM app-name-mysql
Completed: Set machine type on VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com set machine type on VM app-name-mysql
Set machine type on VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com set machine type on VM app-name-mysql
Completed: Stop VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com stopped VM app-name-mysql
Stop VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com stopped VM app-name-mysql
Completed: Add access config to VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com added access config to VM app-name-mysql
Add access config to VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com added access config to VM app-name-mysql
Completed: Delete access config from VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com deleted access config from VM app-name-mysql
Delete access config from VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com deleted access config from VM app-name-mysql
Update bucket svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com updated pro-mehi-project-name-mysql-backups
Completed: Start VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com started VM app-name-mysql
Start VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com started VM app-name-mysql
Completed: Set machine type on VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com set machine type on VM app-name-mysql
Set machine type on VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com set machine type on VM app-name-mysql
Completed: Stop VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com stopped VM app-name-mysql
Stop VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com stopped VM app-name-mysql
Completed: Add access config to VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com added access config to VM app-name-mysql
Add access config to VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com added access config to VM app-name-mysql
Completed: Delete access config from VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com deleted access config from VM app-name-mysql
Delete access config from VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com deleted access config from VM app-name-mysql
Completed: beta.compute.instances.setLabels svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com has executed beta.compute.instances.setLabels on app-name-mysql
beta.compute.instances.setLabels svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com has executed beta.compute.instances.setLabels on app-name-mysql
beta.compute.addresses.setLabels svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com has executed beta.compute.addresses.setLabels on app-name-db-internal-ip
beta.compute.addresses.setLabels svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com has executed beta.compute.addresses.setLabels on app-name-db-external-ip
Set labels of disk svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com set labels of disk mysql-data
Completed: Start VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com started VM app-name-mysql
Start VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com started VM app-name-mysql
Completed: Set machine type on VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com set machine type on VM app-name-mysql
Set machine type on VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com set machine type on VM app-name-mysql
Completed: Stop VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com stopped VM app-name-mysql
Stop VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com stopped VM app-name-mysql
Completed: Add access config to VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com added access config to VM app-name-mysql
Add access config to VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com added access config to VM app-name-mysql
Completed: Delete access config from VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com deleted access config from VM app-name-mysql
Delete access config from VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com deleted access config from VM app-name-mysql
Update bucket svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com updated pro-mehi-project-name-mysql-backups
beta.compute.addresses.setLabels svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com has executed beta.compute.addresses.setLabels on app-name-db-internal-ip
Completed: Start VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com started VM app-name-mysql
Start VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com started VM app-name-mysql
....
These rows caught my attention:
Completed: Add access config to VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com added access config to VM app-name-mysql
Add access config to VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com added access config to VM app-name-mysql
Completed: Delete access config from VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com deleted access config from VM app-name-mysql
Delete access config from VM svc-cnrm-project-name@PROJECT_ID.iam.gserviceaccount.com deleted access config from VM app-name-mysql
I think it is the same as this CLI command: https://cloud.google.com/sdk/gcloud/reference/compute/instances/delete-access-config
So, it's about deleting and adding network config, sounds related to the networkIp
setting that I changed in the workaround above...
If I try to change back to use the external url format then I get this same error again. This config:
networkIp: https://www.googleapis.com/compute/v1/projects/xxxxxxxx/regions/europe-north1/addresses/xxxxx-internal-ip
gives this error:
status:
conditions:
- lastTransitionTime: "2020-11-02T19:28:57Z"
message: 'Update call failed: the desired mutation for the following field(s)
is invalid: [networkInterface.0.NetworkIp]'
reason: UpdateFailed
status: "False"
type: Ready
Something that might be related to this is that I have this network setup:
The internal ip's that I have problems with look like this (gcloud json output) (note both PROJECT_A
and PROJECT_B
references):
{
"address": "172.22.0.2",
"addressType": "INTERNAL",
"creationTimestamp": "2020-09-22T19:19:30.555-07:00",
"description": "Static internal ip",
"id": "xxxxxxxxx",
"kind": "compute#address",
"name": "xxxxxxxxx-internal-ip",
"networkTier": "PREMIUM",
"purpose": "GCE_ENDPOINT",
"region": "https://www.googleapis.com/compute/v1/projects/PROJECT_B/regions/europe-north1",
"selfLink": "https://www.googleapis.com/compute/v1/projects/PROJECT_B/regions/europe-north1/addresses/xxxxxxxxx-internal-ip",
"status": "IN_USE",
"subnetwork": "https://www.googleapis.com/compute/v1/projects/PROJECT_A/regions/europe-north1/subnetworks/vnet-172-22-0-0-22-xxxxxxxxx",
"users": [
"https://www.googleapis.com/compute/v1/projects/PROJECT_B/zones/europe-north1-b/instances/xxxxxxxxx"
]
}
internal ip config connector yaml:
apiVersion: compute.cnrm.cloud.google.com/v1beta1
kind: ComputeAddress
metadata:
annotations:
cnrm.cloud.google.com/deletion-policy: abandon
name: xxxxxxxxxxx-internal-ip
namespace: PROJECT_B
spec:
address: 172.22.0.2
addressType: INTERNAL
description: Static internal ip
ipVersion: IPV4
location: europe-north1
networkRef:
external: https://compute.googleapis.com/compute/v1/projects/PROJECT_A/global/networks/mehivpc
subnetworkRef:
external: https://compute.googleapis.com/compute/v1/projects/PROJECT_A/regions/europe-north1/subnetworks/vnet-172-22-0-0-22-xxxxxx
And yeah, the config connector logs says nothing helpful, it says goes on saying it's regular things and everything looks good there:
# kubectl logs -f -n cnrm-system cnrm-controller-manager-xxxxxxxxxxxxxx-0 manager
{"severity":"info","logger":"computedisk-controller","msg":"starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"mysql-data"}}
{"severity":"info","logger":"computedisk-controller","msg":"creating/updating underlying resource","resource":{"namespace":"NAMESPACE_NAME","name":"mysql-data"}}
{"severity":"info","logger":"computedisk-controller","msg":"successfully finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"mysql-data"}}
{"severity":"info","logger":"computeaddress-controller","msg":"starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"db-external-ip"}}
{"severity":"info","logger":"computeaddress-controller","msg":"creating/updating underlying resource","resource":{"namespace":"NAMESPACE_NAME","name":"db-external-ip"}}
{"severity":"info","logger":"computeaddress-controller","msg":"successfully finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"db-external-ip"}}
{"severity":"info","logger":"computeaddress-controller","msg":"starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"db-internal-ip"}}
{"severity":"info","logger":"computeaddress-controller","msg":"creating/updating underlying resource","resource":{"namespace":"NAMESPACE_NAME","name":"db-internal-ip"}}
{"severity":"info","logger":"computeaddress-controller","msg":"successfully finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"db-internal-ip"}}
{"severity":"info","logger":"computeinstance-controller","msg":"starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"app-mysql"}}
{"severity":"info","logger":"computeinstance-controller","msg":"creating/updating underlying resource","resource":{"namespace":"NAMESPACE_NAME","name":"app-mysql"}}
{"severity":"info","logger":"computeinstance-controller","msg":"successfully finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"app-mysql"}}
{"severity":"info","logger":"storagebucket-controller","msg":"starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"PROJECT_NAME-mysql-backups"}}
{"severity":"info","logger":"storagebucket-controller","msg":"creating/updating underlying resource","resource":{"namespace":"NAMESPACE_NAME","name":"PROJECT_NAME-mysql-backups"}}
{"severity":"info","logger":"storagebucket-controller","msg":"successfully finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"PROJECT_NAME-mysql-backups"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-secretmanager-secret-accessor"}}
{"severity":"info","logger":"tfiamclient","msg":"underlying resource is already up to date","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-secretmanager-secret-accessor"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-secretmanager-secret-accessor"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-logging-log-writer"}}
{"severity":"info","logger":"tfiamclient","msg":"underlying resource is already up to date","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-logging-log-writer"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-logging-log-writer"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-monitoring-metric-writer"}}
{"severity":"info","logger":"tfiamclient","msg":"underlying resource is already up to date","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-monitoring-metric-writer"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-monitoring-metric-writer"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-secretmanager-viewer"}}
{"severity":"info","logger":"tfiamclient","msg":"underlying resource is already up to date","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-secretmanager-viewer"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-secretmanager-viewer"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Starting reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-storage-object-admin"}}
{"severity":"info","logger":"tfiamclient","msg":"underlying resource is already up to date","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-storage-object-admin"}}
{"severity":"info","logger":"iampolicymember-controller","msg":"Finished reconcile","resource":{"namespace":"NAMESPACE_NAME","name":"gsa-mysql-storage-object-admin"}}
I can add that I have also tried to put the internal ip inside the shared vpc host project (I don't want to do it that way because of IAM rules/access etc) but can confirm that it is not working either so I guess I did it correctly having the internal ip inside the client project (subnet is part of host project).
This is the output when trying to have the internal ip in the host project:
status:
conditions:
- lastTransitionTime: "2020-11-03T13:40:08Z"
message: 'Update call failed: error applying desired state: Error creating instance:
googleapi: Error 400: Invalid value for field ''resource.networkInterfaces[0].networkIP'':
''https://compute.googleapis.com/compute/v1/projects/xxxxxxxx/regions/europe-north1/addresses/tmp-test-vm-internal-ip-2''.
IP address ''projects/xxxxxxxx/regions/europe-north1/addresses/tmp-test-vm-internal-ip-2''
(172.22.0.22) is reserved by another project., invalid'
reason: UpdateFailed
status: "False"
type: Ready
I can also confirm that I have reproduced this same error on a new VM.
Hi @kibbles-n-bytes, is there anything I can do to help progress this issue? This blocks some of our work and it would be very helpful if we could find a solution or workaround other than stop using config connector.
Hey @Scorpiion, thanks for the incredibly detailed debug information!
@kibbles-n-bytes is on vacation for a bit, but I'm catching up and will get you a reply by tomorrow at the latest.
At this point your description is clear to me. I'm doing some work to see if I can repro the situation on our side (starting with the NetworkIP external reference not working as expected), we don't want you to have to workaround config-connector either.
Random quick question: are you using Config Connector in namespace mode? It's a relatively new feature so I presume no, but just checking.
Hi! Quick update: I can repro scenario 1. We're having a discussion internally about the right resolution there.
Hi @toumorokoshi, thanks for filling in and I hope I did not disturb @kibbles-n-bytes on his vacation.
We do use config connector in namespaced mode, I would say we are a very early adopter of config connector and have used it since early this year.
Great to hear that you were able to reproduce scenario 1. Thanks for moving this along and let me know if there is anything I can do to help! (I'm also open to do a hangout/google meet session if it would help)
Hi! To start with, wanted to provide some more information on a situation:
I can replicate two out of the three issues at play here:
Notes:
Unfortunately, fixing networkIP to not error on a selflink is non-trivial. I'm looking into some ways to get this fixed, but the best choice for now is to hard-code the value. I can't give a good ETA on the fix.
I'm actually curious how you came to use a selflink: our documentation states that this must be an IP address: the fact that selflink is supported is an implementation detail, not a feature.
I presume the main reason you did this is because we don't support a ComputeAddress resourceRef for a networkIP. That's a bit easier, so I'm looking into it that first. I can come back with an answer in the next few days if this is a possibility.
This one is also a a bit tricky for reasons similar to networkIP, so I can't give a good ETA. I would recommend using the fully qualified image for now, rather than the family.
I apologize since I'm sure this isn't the answer you wanted, but I am exploring options. I'll update if anything is doable for the networkIP / sourceImageRef external errors, but can you confirm or deny that the ComputeAddressRef would be helpful if it existed as an option for networkIP?
Describe the bug We have two problems with config connector right now:
The VM got setup correctly when the yaml was initially applied, but then config connector started giving these errors and now we are in a state where we can't do updates and this is a quite big blocker right now (some things I have manually updated in the UI and "backported" to the yaml, or the other way around, but it's an accident waiting to happen). I have had this for about a month now, thinking that a GKE addon update would come soon that might resolve this but so far no luck. Initially we used the "stable" GKE release channel, but I updated us to use the "regular" channel so we should be able to use the GKE addon (previously had a script based automation of config connector install/update). Now we seem to be "stuck" with config connector version
1.15.1
that was released in 2020-03-19 (https://github.com/GoogleCloudPlatform/k8s-config-connector/releases/tag/1.5.1), that is more than 7 months ago. And to be honest, with the high development pace of config connector that is ages...One of our problems above with
networkInterface.0.NetworkIp
I think is resolved in 1.6.1 as described here. But yeah, even if it was fixed in April I don't seem to be able to get this update with the addon.The second error above about the bootdisk, it seemed when I tested earlier that only happened when using the
cos-cloud/cos-81-lts
value asbootDisk.initializeParams.sourceImageRef.external
, when I used the debian family of image the error did not appear I think (but it was some time ago I tested, I'm 95% it did not appear with debian). So I've been thinking that this maybe has also been resolved already in newer versions of config connector but not completely sure.ConfigConnector Version
To Reproduce Install GKE with regular channel and config connector addon. Check that version is 1.15.1. I don't think reproducing the error I got make sense time wise at the moment, I think it's more important to update the config connector addon so I can check with a later version.
Questions
cnrm.cloud.google.com/deletion-policy: abandon
) and then recreate it but it gives the same error and no updates of for example the metadatauser-data
value that I want to update.If I have missed some information above please just ask and I will try to provide it ASAP.