Open n1koo opened 2 years ago
Hi @n1koo , thank you for reporting the issue! I tried to create a RedisInstance in Config Connector 1.61.0
and then upgraded the version to 1.74.0
, but didn't encounter the error you described (see my yaml snippet and steps in the end of the comment). Could you share more details about this bug following the template below?
## Additional Diagnostic Information
### Kubernetes Cluster Version
<!--
Run the following command to get the Kubernetes cluster version:
kubectl version --short
-->
### Config Connector Version
<!--
Run the following command to get the installed Config Connector version:
kubectl get ns cnrm-system -o jsonpath='{.metadata.annotations.cnrm\.cloud\.google\.com/version}'
-->
### Config Connector Mode
<!--
Run the following command to get the Config Connector Mode:
kubectl get ConfigConnector "configconnector.core.cnrm.cloud.google.com" -o=jsonpath="{@.spec.mode}"
-->
## Log Output
<!--
Include any relevant log output. See
https://cloud.google.com/config-connector/docs/troubleshooting#logging
If the issue is for a specific resource, please include those logs.
-->
## Steps to Reproduce
### Steps to reproduce the issue
<!--- Steps needed to reproduce the issue. --->
### YAML snippets
<!--- YAML snippets needed to reproduce the issue. See the following as an example. --->
```yaml
apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubTopic
metadata:
labels:
label-one: "value-one"
name: pubsubtopic-sample
====== YAML and steps I tried ======
The YAML snippet I used:
apiVersion: redis.cnrm.cloud.google.com/v1beta1 kind: RedisInstance metadata: name: redisinstance-test namespace: [namespace] spec: displayName: Sample Redis Instance region: us-central1 tier: BASIC memorySizeGb: 1
Steps:
1. Installed Config Connector `1.61.0` in cluster mode.
2. Created a `RedisInstance` using the YAML snippet above and waited for for it to be ready.
3. Upgraded Config Connector to `1.74.0`.
4. Run `kubectl describe` to check the status of the RedisInstance and it is `UpToDate`.
@maqiuyujoyce the problem exists with older redis instances. In our case they were atleast 270days old.
Otherwise your steps are correct, but you require an older instance to reproduce. I don't think going through the template has much value here as its not a k8s issue but just an issue how those values are in GCP backend
For example comparing a new and old instance via gcloud
:
Old (as in older than 9months)
‽ gcloud redis instances describe <new-instance> --region <region> --project <project>|grep read
(nothing matched)
In console showing Read Replicas: NA
New(as in newer than 9months)
‽ gcloud redis instances describe <new-instance> --region <region> --project <project>|grep read
readReplicasMode: READ_REPLICAS_DISABLED
Hi @n1koo , thanks for sharing the context.
Have you tried to abandon the RedisInstance and acquire it with readReplicasMode
explicitly set to NA
?
Sorry @maqiuyujoyce not yet. We reverted the upgrade for now. Doing that dance for our instances would be somewhat painful though as its not just a handful.
We could write that automation, but I was a little bit hoping config-connector could handle it better internally?
This change is not mentioned even as breaking change
Hi @n1koo, this is indeed strange behavior.
Can you get the state of your Redis instances by hitting the API directly? Use "Try It!" here: https://cloud.google.com/memorystore/docs/redis/reference/rest/v1/projects.locations.instances/get
I'm particularly interested in the readReplicasMode
value returned by the API.
Also: can you verify that your RedisInstance
resources don't specify a readReplicasMode
after the KCC upgrade?
It may be worth noting that "NA" is not a valid readReplicasMode
value, but this should be irrelevant as long as there is no diff between the resource configuration that KCC tries to apply and the resource state that KCC reads from the API.
As far as I can tell, we are not modifying the readReplicasMode
value we read back from the API at all even if the value is technically invalid.
Unfortunately, this is also going to be a bit challenging to debug for us since we do not have "old" Redis instances to work with.
Ok, I suspect the API is actually returning readReplicasMode: ""
for "old" Redis instances -- can you confirm this @n1koo?
@jcanseco sorry, i dont have access to that due to corp policies but just checking it via google-cloud-redis and eg simple script
credentials, project = google.auth.default()
client = redis_v1.CloudRedisClient(credentials=credentials)
project_id = "xxx"
location = "us-east1"
memorystore_instance_name = "xxx"
parent = f"projects/{project_id}/{location}"
memorystore_instance_path = client.instance_path(
project_id, location, memorystore_instance_name)
try:
memorystore_instance = client.get_instance(
request={'name': memorystore_instance_path})
log.info(memorystore_instance)
the whole entry for read_replicas_mode
was missing (not just value being empty or nil)
eg. a 2 year old redis:
display_name: "xxx"
location_id: "us-east1-d"
redis_version: "REDIS_4_0"
reserved_ip_range: "xxx"
host: "xxx"
port: 6379
current_location_id: "us-east1-d"
create_time {
seconds: 1585590006
nanos: 241136792
}
state: READY
tier: BASIC
memory_size_gb: 1
authorized_network: "xxx"
persistence_iam_identity: "xxx"
connect_mode: DIRECT_PEERING
transit_encryption_mode: DISABLED
maintenance_policy {
create_time {
seconds: 1628336720
nanos: 787160000
}
update_time {
seconds: 1628336720
nanos: 787160000
}
weekly_maintenance_window {
day: SATURDAY
start_time {
hours: 5
}
duration {
seconds: 3600
}
}
}
nodes {
id: "node-0"
zone: "us-east1-d"
}
vs a recent'ish redis (2months old)
display_name: "xxx"
location_id: "us-east1-c"
redis_version: "REDIS_4_0"
reserved_ip_range: "xxx"
host: "xxx"
port: 6379
current_location_id: "us-east1-c"
create_time {
seconds: 1643979721
nanos: 613829957
}
state: READY
tier: BASIC
memory_size_gb: 1
authorized_network: "xxx"
persistence_iam_identity: "xxx"
connect_mode: DIRECT_PEERING
transit_encryption_mode: DISABLED
maintenance_policy {
create_time {
seconds: 1643979932
nanos: 571797000
}
update_time {
seconds: 1643979932
nanos: 571797000
}
weekly_maintenance_window {
day: SATURDAY
start_time {
hours: 5
}
duration {
seconds: 3600
}
}
}
nodes {
id: "node-0"
zone: "us-east1-c"
}
read_replicas_mode: READ_REPLICAS_DISABLED
Hi @n1koo. Gotcha, I'll check in with the Redis team internally, but if you can, it would really help us if you can confirm what value for readReplicasMode
is returned by the API. Checking what a client library returns is insufficient unfortunately since they don't necessarily fully reflect what the API actually returns.
Hi @n1koo, can you please report this to GCP support so that the Redis team can take a look?
I chatted with the Redis team internally and it seems that old Redis instances should still also have a value for readReplicasMode
(READ_REPLICAS_DISABLED
).
If you are seeing that your instances are not reporting a value for readReplicasMode
(even if it is through google-cloud-redis
), I think that might be worth reporting since that appears to be unintended behavior.
Nevermind! It seems Terraform users have also reported facing this issue: https://github.com/hashicorp/terraform-provider-google/issues/11410
And Terraform team is able to reproduce the issue as they have an old Redis instance that predates the read replicas feature. Let me work with the Terraform and Redis teams internally to see what we can do.
Awesome thank you! I also finally got the access so heres the raw api response;
{
"name": "projects/xxx/locations/xxx/instances/xxx",
"displayName": "xx",
"labels": {
//xx
},
"locationId": "us-east1-b",
"redisVersion": "REDIS_4_0",
"reservedIpRange": "xxx",
"host": "xxx",
"port": 6379,
"currentLocationId": "us-east1-b",
"createTime": "2020-04-22T19:51:04.021313435Z",
"state": "READY",
"tier": "BASIC",
"memorySizeGb": 8,
"authorizedNetwork": "projects/xxx/global/networks/xxx",
"persistenceIamIdentity": "serviceAccount:xxx-compute@developer.gserviceaccount.com",
"connectMode": "DIRECT_PEERING",
"transitEncryptionMode": "DISABLED",
"maintenancePolicy": {
"createTime": "2021-08-07T11:01:09.836707Z",
"updateTime": "2021-08-07T11:01:09.836707Z",
"weeklyMaintenanceWindow": [
{
"day": "SATURDAY",
"startTime": {
"hours": 5
},
"duration": "3600s"
}
]
},
"nodes": [
{
"id": "node-0",
"zone": "us-east1-b"
}
]
}
Seems this is already patched in terraform-provider so 🤞 fix will land soon to your dependency
Yep that is correct. We'll post an update once this has been fixed in KCC :)
We faced with the same problem. JFYI: Our work around without rolling back config connector
cnrm.cloud.google.com/deletion-policy: abandon
to redisinstnace k8s object spec.readReplicasMode: ""
to yaml Not sure that it's the best solution and how it will be behave in future after bug will be resolved.
@Veres72 thank you for sharing! Yes that workaround makes sense and is what we recommend for now.
We are looking to release a new Config Connector version with a fix which we intend to also roll out to GKE 1.23+ clusters using the KCC add-on.
We are also asking the Redis team to implement a server-side fix to accommodate users who may not be able to upgrade for whatever reason, but we have not received updates from them yet.
Checklist
Bug Description
After upgrading to
1.74
from1.61
all our old Redis instances started showing'Update call failed: cannot make changes to immutable field(s): [readReplicasMode]'
THis issue was introduced in
1.71
This is due to the instance being created before this feature was a thing. It seems its
NA
for old instances and cannot be changed.