Open chadkouse opened 1 year ago
@chadkouse CloudSQLInstance is a complex resource, the underlying GCP API can also be tricky in certain cases.
Some suggestions you can experiment with:
cnrm.cloud.google.com/state-into-spec
as absent
[1] in your CloudSQLInstance YAML, this will change the behavior of KCC controller and stop it from populating unspecified values back into K8s spec. This feature could be helpful when we are working with some non-standard API behaviors.[1] https://cloud.google.com/config-connector/docs/concepts/ignore-unspecified-fields
@diviner524 Thanks for the info, I just tested this -- It looks like by using a minimal config it still set the instance to "updating" but only for around 30 seconds or so (see image). I didn't get a chance to test if connectivity was lost during that update. I'll try to test that and report back soon but this may be the answer.
Trying the state-into-spec
to absent
resulted in the following error message: kind 'SQLInstance' does not support having annotation 'cnrm.cloud.google.com/state-into-spec' set to value 'absent'
so maybe that's not a viable option for SQLInstance
@chadkouse
If using the minimal config still gives you "Updating", it might be related to some specific fields/config in your YAML.
This feature has been supported in SQLInstance for a while (since v1.94.0). It sounds like you are using an old version of Config Connector, which also implies there might be bugs in the CloudSQLInstance resource that has already been fixed. Can you try with the latest version?
We also noticed that our SQLInstance keeps getting updated every 10 minutes with KCC v1.120.1 and stateIntoSpec=absent. Among ~50 kinds of resources we use, we also see this behavior with StorageTransferJob.
Is there anything we can do to find out what field is causing this?
This is still happening with v1.124.0. Is there anything we can do?
This is still happening with v1.124.0. Is there anything we can do?
@tsawada can you please share the SQLInstance CR YAML you are using?
Thanks @jasonvigil . Here's what we use:
apiVersion: sql.cnrm.cloud.google.com/v1beta1
kind: SQLInstance
metadata:
name: postgres-db-1
annotations:
cnrm.cloud.google.com/deletion-policy: "abandon"
cnrm.cloud.google.com/state-into-spec: "absent"
spec:
databaseVersion: POSTGRES_13
region: asia-northeast1
settings:
backupConfiguration:
backupRetentionSettings:
retainedBackups: 100
retentionUnit: COUNT
enabled: true
pointInTimeRecoveryEnabled: true
startTime: '0:00'
transactionLogRetentionDays: 7
databaseFlags:
- name: cloudsql.iam_authentication
value: "1"
diskAutoresize: true
diskType: PD_HDD
deletionProtectionEnabled: true
ipConfiguration:
ipv4Enabled: false
privateNetworkRef:
name: "my-private-network"
sslMode: "TRUSTED_CLIENT_CERTIFICATE_REQUIRED"
tier: db-custom-1-3840
Ok, there appears to be a combination of a few issues going on here @tsawada. I just made a fix for one of the code issues: https://github.com/GoogleCloudPlatform/k8s-config-connector/pull/3106.
However, there are a couple of issues with the YAML you posted.
cloudsql.iam_authentication
database flag should be "on", not "1". Ref: https://cloud.google.com/sql/docs/postgres/flags#postgres-crequireSsl: true
(because the instance type is postgres, and sslMode: "TRUSTED_CLIENT_CERTIFICATE_REQUIRED"
is specified). Ref: https://cloud.google.com/sql/docs/mysql/admin-api/rest/v1/instances#ipconfigurationAfter you make those updates, the fix above lands, and we enable the new version of the controller code (todo in a future release, perhaps v1.126), this issue should be resolved.
@chadkouse, for the original issue, could you please also share the CR YAML you are using?
@tsawada the fixes are now in.
At that point, the "constantly re-updating" issue should be fixed.
@jasonvigil Thank you so much for fixing this quickly! I'll try next week and will get back to you if things went well.
Describe your question
I used the output from
gcloud sql instances describe {INSTANCE_NAME}
to build a configuration for a cloud sql instance - making sure all of the settings were the same, the tier, the network, etc When I applied the config to my cluster it did aquire the resource but it's status in k8s was Updating and the google cloud console also showed it updating. The server actually even became unavailable for a few minutes.I wasn't able to get information about why the resource was being updated.
I would like to perform this process again in our production environment but I am wondering if there is a zero-downtime version of this process?