crossplane-contrib / provider-upjet-aws

Official AWS Provider for Crossplane by Upbound.
https://marketplace.upbound.io/providers/upbound/provider-aws
Apache License 2.0
138 stars 117 forks source link

provider-aws-ecs: taskDefinition is not updated #1061

Closed denzhel closed 6 months ago

denzhel commented 6 months ago

What happened?

I've created a task definition and then updated one the container-definition values and expected crossplane to create a new revision, instead it tries to modify the existing revision and fails with this error:

 Message:               async update failed: refuse to update the external resource: cannot change the value of the argument "container_definitions" from "[{\"cpu\":10,\"environment\":[],\"essential\":true,\"image\":\"service-first\",\"memory\":512,\"mountPoints\":[],\"name\":\"first\",\"portMappings\":[],\"volumesFrom\":[]}]" to "[{\"cpu\":11,\"essential\":true,\"image\":\"service-first\",\"memory\":512,\"name\":\"first\"}]"

How can we reproduce it?

  1. Apply the following file using kubectl apply -f task.yaml:
    apiVersion: ecs.aws.upbound.io/v1beta1
    kind: TaskDefinition
    metadata:
    name: dennis
    spec:
    forProvider:
    containerDefinitions: |-
      [
        {
          "name": "first",
          "image": "service-first",
          "cpu": 10,
          "memory": 512,
          "essential":true
        }
      ]
    family: dennis
    region: us-central-1
    providerConfigRef:
    name: "dev-eu-central-1-cs-account"
  2. Change cpu to 20
  3. Re-apply the file using kubectl apply -f task.yaml
  4. Run kubectl describe taskdefinition.ecs.aws.upbound.io/dennis
  5. See the error

What environment did it happen in?

turkenf commented 6 months ago

Hi @denzhel, thank you for raising this issue.

The issue can be reproduced with the information provided, I also get the same error when I add a second definition block.

Additionally, when I try with provider version 0.43, I get the following error:

    message: 'observe failed: cannot run plan: plan failed: Instance cannot be destroyed:
      Resource aws_ecs_task_definition.sample-task1 has lifecycle.prevent_destroy
      set, but the plan calls for this resource to be destroyed. To avoid this error
      and continue with the plan, either disable lifecycle.prevent_destroy or reduce
      the scope of the plan using the -target flag.'
    reason: ReconcileError
    status: "False"
denzhel commented 6 months ago

Hi @turkenf ,

I see that this issue was already known in v0.38.0. https://github.com/upbound/provider-aws/issues/823

Is there an ETA for a fix ?

jeanduplessis commented 6 months ago

@denzhel

We've discussed this issue, and we don't feel we should allow the Terraform layer to replace (destroy & recreate the external resource with the updated cpu (core) count) due to the immutable fields policy of the XRM.

It is suggested that you delete the corresponding MR and recreate a new MR with the updated parameters as a path forward to configure an existing TaskDefinition.ecs with an updated cpu count, if needed.


Going forward we should be properly treating this field as an immutable field, as captured by the following two issues:

We can also:

denzhel commented 6 months ago

@denzhel

We've discussed this issue, and we don't feel we should allow the Terraform layer to replace (destroy & recreate the external resource with the updated cpu (core) count) due to the immutable fields policy of the XRM.

It is suggested that you delete the corresponding MR and recreate a new MR with the updated parameters as a path forward to configure an existing TaskDefinition.ecs with an updated cpu count, if needed.


Going forward we should be properly treating this field as an immutable field, as captured by the following two issues:

We can also:

  • Add a reference to the XRM explaining the rationale for blocking Terraform from replacing the resource when the cpu count is updated

  • Acknowledge the missing immutability feature, e.g., generating the CRD validation rules to block updates to the forceNew fields (in TF plugin SDK) or the "requires replace" plan modifiers.

Hi @jeanduplessis ,

Thank you for the reply !

This behavior happens not only with cpu field but with all fields. You can check the same with the image field where we wanted to bump the version when updating the micro service version.

I"m not sure how to implement your suggestion as we use CRD, Compositions and Claims. The claim is templated using Helm and ArgoCD.

Help will be appreciated.

mbbush commented 6 months ago

This behavior happens not only with cpu field but with all fields.

This makes sense, as an ECS task definition is supposed to be immutable once created.

One trick you could try is to make the metadata.name field of the managed resource dynamic in your composition. One way to do that might be hashing all the properties in spec.forProvider and using the hash as part of the name.

denzhel commented 6 months ago

Hi @mbbush,

So I have this claim:

apiVersion: bigpanda.io/v1alpha1
kind: EcsCluster
metadata:
  name: {{ printf "%s-%s" .Values.global.env .Values.ecs.name }}
  annotations:
    crossplane.io/external-name: {{ printf "%s-%s" .Values.global.env .Values.ecs.name }}
spec:
  taskDefinition:
    containerDefinitions: {{ .Values.ecs.taskDefinition.containerDefinitions | toJson | quote }}

This composition:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: ecscluster
spec:
  compositeTypeRef:
    apiVersion: bigpanda.io/v1alpha1
    kind: CompositeEcsCluster
  resources:
    # creation of the task definition
    - base:
        apiVersion: ecs.aws.upbound.io/v1beta1
        kind: TaskDefinition
        spec:
          forProvider: {}
          providerConfigRef:
            name: ""
      patches:
        # create an hash from the containerDefinition object
        - type: ToCompositeFieldPath
          fromFieldPath: spec.taskDefinition.containerDefinitions
          toFieldPath: status.containerDefinitionsHash
          transforms:
            - type: string
              string:
                type: Convert
                convert: "ToSha256"
         # define the name of the object
        - type: CombineFromComposite
          combine:
            variables:
              - fromFieldPath: spec.name
              - fromFieldPath: status.containerDefinitionsHash
            strategy: string
            string:
              fmt: "%s-task-%s"
          toFieldPath: metadata.name

Added this to the XRD:

status:
              containerDefinitionsHash:
                type: string

But I cannot get the hashed value to appear inside the composition. I don't see it in the status block:

k get composite dev-us-east-1-inbound-alert-persistence-v4hsz -o json | jq .status
{
  "clusterArn": "arn:aws:ecs:us-east-1:123456:cluster/dev-us-east-1-inbound-alert-persistence-cluster",
  "clusterName": "dev-us-east-1-inbound-alert-persistence-cluster",
  "conditions": [
    {
      "lastTransitionTime": "2024-01-08T20:18:41Z",
      "reason": "ReconcileSuccess",
      "status": "True",
      "type": "Synced"
    },
    {
      "lastTransitionTime": "2024-01-08T20:21:07Z",
      "reason": "Available",
      "status": "True",
      "type": "Ready"
    }
  ],
  "taskArn": "arn:aws:ecs:us-east-1:123456:task-definition/dev-us-east-1-inbound-alert-persistence-task:1"
}
denzhel commented 6 months ago

HI @mbbush ,

I was able to get it working but now I have another issue. When I first create the resources using the claim, two tasks are created:

  1. Without the hash
  2. With the hash I expected both of them to include a hashed value, different hashed value.
k get managed
NAME                                                                                                                                              READY   SYNCED   EXTERNAL-NAME                                  AGE
taskdefinition.ecs.aws.upbound.io/dev-us-east-1-inbound-alert-persistence-jx7kc-x9rp4                                                             True    True     dev-us-east-1-inbound-alert-persistence-task   12s
taskdefinition.ecs.aws.upbound.io/dev-us-east-1-inbound-alert-persistence-task-cb1dfba189efa313836effc0489847dfe17b5478b5413643a301396be8a3b1ab   True    True     dev-us-east-1-inbound-alert-persistence-task   12s
denzhel commented 6 months ago

Hi @jeanduplessis and @turkenf ,

Help would be much appreciated <3

denzhel commented 6 months ago

Wanted to share that I was able to solve the above using this in the claim file:

containerDefinitionsHash: {{ sha256sum (.Values.ecs.taskDefinition.containerDefinitions | toJson | quote) }}

and addition to the CRD