aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes
https://aws-controllers-k8s.github.io/community/
Apache License 2.0
2.41k stars 254 forks source link

ReplicationGroup Creation invokes ModifyReplicationGroup immediately after creation. #228

Closed nmvk closed 4 years ago

nmvk commented 4 years ago

Describe the bug Elasticache uses status to determine if a ReplicationGroup is created. Once we create ReplicationGroup, status of RG is creating and it can few minutes to move to active status. During this time we do not allow ModifyReplicationGroup call to be invoked as status is not active.

Upon creation, ModifyReplication group gets invoked continuously and results in error.

2020-08-20T12:02:26.686-0700    DEBUG   reconciler.sync created new resource    {"arn": null}
2020-08-20T12:02:26.686-0700    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "replicationgroup", "request": "default/test-rg"}
2020-08-20T12:02:27.015-0700    ERROR   controller-runtime.controller   Reconciler error        {"controller": "replicationgroup", "request": "default/test-rg", "error": "InvalidReplicationGroupState: Replication group must be in available state to modify.\n\tstatus code: 400, request id: ecbe27cd-acec-44e6-acd2-34540c4630af"}
github.com/go-logr/zapr.(*zapLogger).Error
        /Users/raghavnm/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /Users/raghavnm/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /Users/raghavnm/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker

Steps to reproduce

apiVersion: elasticache.services.k8s.aws/v1alpha1
kind: ReplicationGroup
metadata:
  name: test-rg
spec:
    engine: redis 
    replicationGroupID: test-k8
    replicationGroupDescription: test
    automaticFailover: true
    cacheNodeType: cache.t3.micro
    numNodeGroups: 3
    replicasPerNodeGroup: 2

Expected outcome No invocation of ModifyReplicationGroup if status is not active

Environment

nmvk commented 4 years ago

We can provide a field called precondition, In this we can specify the the resource attribute and list of allowed values.

resources:
  CacheSubnetGroup:
    exceptions:
      codes:
        404: CacheSubnetGroupNotFoundFault
operations:
  ModifyReplicationGroup:
    override_values:
      ApplyImmediately: true
    precondition:
      status:
        - Available
        - Snapshotting

In generated code before invoking the specified API, we would Describe and check if precondition is satisfied, if not we would return sdkFind. Though reconciler would still invoke the Modify API, we would not invoke ElastiCache API.

Elasticache wait link - https://docs.aws.amazon.com/cli/latest/reference/elasticache/wait/index.html

This might be useful for other services like EC2 which also have a wait logic https://docs.aws.amazon.com/cli/latest/reference/ec2/wait/index.html

jaypipes commented 4 years ago

Hmm, yeah, this is indeed an interesting idea @nmvk. A couple thoughts in response, though...

1) Instead of precondition, though, how about we call it assert? In other words, you would read the generator.yaml contents like this "for ModifyReplicationGroup operation, assert that Status field value is 'Available' or 'Snapshotting'".

2) Instead of returning the result of sdkFind if the assertions fail, we should add a Condition object to the CR's Status.Conditions collection that indicates the CR has a pending modification to its actual state that is waiting for a field assertion to assert truthfully. That way, Kubernetes users can query the Status.Conditions field and see what the status of any desired state changes are.

Thoughts?

nmvk commented 4 years ago

Thanks! @jaypipes for the suggestion. I liked the idea that K8 users can query the Status.Conditions field and see if resource is in pending modifying state. Will use assert in generator.yaml instead of precondition