crossplane-contrib / provider-upjet-aws

Official AWS Provider for Crossplane by Upbound.
https://marketplace.upbound.io/providers/upbound/provider-aws
Apache License 2.0
137 stars 112 forks source link

[Bug]: Replicationgroup.elasticache.aws.upbound.io in async after upgrade provider from 1.1.0 to 1.1.4 #1351

Open mihaelabalas84 opened 3 weeks ago

mihaelabalas84 commented 3 weeks ago

Is there an existing issue for this?

Affected Resource(s)

ReplicationGroup.elasticache.aws.upbound.io/v1beta2

Resource MRs required to reproduce the bug

The following replication group was created using the provider version 1.1.0

apiVersion: elasticache.aws.upbound.io/v1beta2
kind: ReplicationGroup
metadata:
  name: ***-staging-redis
spec:
  deletionPolicy: Orphan
  forProvider:
    applyImmediately: true
    autoMinorVersionUpgrade: "true"
    automaticFailoverEnabled: true
    description: '***-staging Redis cache '
    engine: redis
    engineVersion: "7.1"
    ipDiscovery: ipv4
    maintenanceWindow: fri:05:00-fri:06:00
    multiAzEnabled: true
    networkType: ipv4
    nodeType: cache.t3.small
    numNodeGroups: 1
    parameterGroupName: default.redis7
    port: 6379
    region: eu-west-1
    replicasPerNodeGroup: 1
    securityGroupIdRefs:
    - name: ***-staging-redis-security-group
    securityGroupIds:
    - ****
    snapshotWindow: 03:30-04:30
    subnetGroupName: ***-staging-redis-csg
    subnetGroupNameRef:
      name: ***-staging-redis-csg
  providerConfigRef:
    name: provider-aws-upbound-elasticache

Steps to Reproduce

Using the manifest above create replication group with all upbpund prioviders and aws family in version 1.1.0. Upgrade elasticache provider to 1.1.4 (all providers were upgraded including provider-family-aws).

What happened?

All replication groups went into Async state.

Relevant Error Output Snippet

conditions:
  - lastTransitionTime: "2024-06-10T09:15:08Z"
    message: "update failed: async update failed: failed to update the resource: [{0
      changing auth_token for ElastiCache Replication Group (***-staging-redis):
      InvalidParameterValue: The AUTH token modification is only supported when encryption-in-transit
      is enabled.\n\tstatus code: 400, request id: daa20ded-8655-41bd-a278-f2bed79877b6
      \ []}]"
    reason: ReconcileError
    status: "False"
    type: Synced
  - lastTransitionTime: "2024-06-10T09:15:08Z"
    message: "async update failed: failed to update the resource: [{0 changing auth_token
      for ElastiCache Replication Group (***-staging-redis): InvalidParameterValue:
      The AUTH token modification is only supported when encryption-in-transit is
      enabled.\n\tstatus code: 400, request id: daa20ded-8655-41bd-a278-f2bed79877b6
      \ []}]"
    reason: AsyncUpdateFailure
    status: "False"
    type: LastAsyncOperation
  - lastTransitionTime: "2024-06-06T07:59:48Z"
    reason: Available
    status: "True"
    type: Ready

Crossplane Version

1.15.2

Provider Version

1.1.4

Kubernetes Version

1.28.1

Kubernetes Distribution

EKS

Additional Info

I understand where this comes from, it is from terrafrom-provider-aws change https://github.com/hashicorp/terraform-provider-aws/pull/34460 that now forces to set auth_token_update_strategy. For replication groups where in transit encryption is not enabled, AWS does not accept this update and all our Replication Group remain in unSync state. So far the only solution is to downgrade the provider or to recreate the cache in the new version.

turkenf commented 3 weeks ago

Hi @mihaelabalas84,

Thank you for raising this issue, kindly consider the following;

mihaelabalas84 commented 3 weeks ago

Hi @mihaelabalas84,

Thank you for raising this issue, kindly consider the following;

  • please add a title, briefly state the problem/bug, and indicate which family provider is causing the problem
  • check the versions and make sure you wrote it correctly
  • add explicit reproduction steps so we can reproduce the issue again

done. Sorry for the mess.

caiofralmeida commented 1 week ago

The same issue is happening at version 1.3.1

async update failed: failed to update the resource: [{0 changing auth_token for ElastiCache Replication Group (kafka-operator): InvalidParameterValue: The AUTH token modification is only supported when encryption-in-transit is enabled.

I notice that in the version v1.6.1 there is a new field autoGenerateAuthToken to disable this behavior.

chlunde commented 1 week ago

I wonder if it is related to the introduction of https://github.com/hashicorp/terraform-provider-aws/commit/0b7e4ba24b465e4c0e595edd6bda0d3989ec38c7#diff-5d55dcf3aa8ffba3437fb3ff6b7a96b74c9f9196d47dbb4bb63369259cc083bc a few releases back

I think if someone can install the old provider (1.1.4?) in a lab cluster, setup a cluster without auth, kubectl get -o yaml --show-managed-fields=true and then upgrade to >= 1.3.1, and run the same command, maybe we get a hint?

You are running without any auth, right? The AWS API has an explicit field for that, but not the terraform and crossplane provider: https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/in-transit-encryption-disable.html

Same as #1370