Open haarchri opened 2 years ago
Hey I saw that thread in the slack group. I wonder if there is something deeper going on here. I've been debugging a similar resource duplication thing for a while.
I've been seeing resources fall out of sync and then they start duplicating in aws until:
I have tried:
Semi-Reproducible example using crossplane v1.9.0, aws jet v0.5.0. It sucks that it has so many resources but it causes the issues to happen faster: I applied the following (change the IDs of resources as they are created):
---
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2
kind: VPC
metadata:
name: crossplane-vpc-single-us-east-1
spec:
forProvider:
cidrBlock: 192.168.0.0/16
enableDnsHostnames: true
enableDnsSupport: true
instanceTenancy: default
region: us-east-1
tags:
Name: crossplane-vpc-single-us-east-1
---
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2
kind: VPC
metadata:
name: crossplane-vpc-single-us-east-2
spec:
forProvider:
cidrBlock: 192.168.0.0/16
enableDnsHostnames: true
enableDnsSupport: true
instanceTenancy: default
region: us-east-2
tags:
Name: crossplane-vpc-single-us-east-2
---
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2
kind: VPC
metadata:
name: crossplane-vpc-single-us-west-1
spec:
forProvider:
cidrBlock: 192.168.0.0/16
enableDnsHostnames: true
enableDnsSupport: true
instanceTenancy: default
region: us-west-1
tags:
Name: crossplane-vpc-single-us-west-1
---
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2
kind: VPC
metadata:
name: crossplane-vpc-single-us-west-2
spec:
forProvider:
cidrBlock: 192.168.0.0/16
enableDnsHostnames: true
enableDnsSupport: true
instanceTenancy: default
region: us-west-2
tags:
Name: crossplane-vpc-single-us-west-2
2. Created EIPs
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: EIP metadata: name: crossplane-elastic-ip-single-us-west-2 spec: forProvider: region: us-west-2 vpc: true tags: Name: crossplane-elastic-ip-single-us-west-2
3. Created InternetGateways
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2
kind: InternetGateway
metadata:
name: crossplane-internet-gateway-single-us-west-2
spec:
forProvider:
region: us-west-2
vpcId:
4. Created SecurityGroups
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: SecurityGroup metadata: name: crossplane-security-group-single-us-east-1 spec: forProvider: description: Crossplane resource communication egress:
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: SecurityGroup metadata: name: crossplane-security-group-single-us-east-2 spec: forProvider: description: Crossplane resource communication egress:
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: SecurityGroup metadata: name: crossplane-security-group-single-us-west-1 spec: forProvider: description: Crossplane resource communication egress:
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: SecurityGroup metadata: name: crossplane-security-group-single-us-west-2 spec: forProvider: description: Crossplane resource communication egress:
5. Created Subnets
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2
kind: Subnet
metadata:
name: crossplane-private-subnet-single-us-west-2
spec:
forProvider:
availabilityZone: us-west-2a
cidrBlock: 192.168.1.0/24
mapPublicIpOnLaunch: false
region: us-west-2
vpcId:
6. Created RouteTables
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: RouteTable metadata: name: crossplane-public-route-table-single-us-east-1 spec: forProvider: region: us-east-1 route:
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: RouteTable metadata: name: crossplane-public-route-table-single-us-east-2 spec: forProvider: region: us-east-2 route:
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: RouteTable metadata: name: crossplane-public-route-table-single-us-west-1 spec: forProvider: region: us-west-1 route:
apiVersion: ec2.aws.jet.crossplane.io/v1alpha2 kind: RouteTable metadata: name: crossplane-public-route-table-single-us-west-2 spec: forProvider: region: us-west-2 route:
Initially everything will come up fine and then issues begin to occur:
For this part it may be good to also download the attached provider debug logs
Looking at resources the issues vary, for example crossplane-public-route-table-single-us-east-2 has the following when in /tmp/2c4571c9-a887-4b95-a86c-362b4f5ed2d7
running terraform plan
-> Instance cannot be destroyed
main.tf.json
{
"provider": {
"aws": {
"access_key": "<REMOVED>",
"region": "us-east-2",
"secret_key": "<REMOVED>",
"token": ""
}
},
"resource": {
"aws_route_table": {
"crossplane-public-route-table-single-us-east-2": {
"lifecycle": {
"prevent_destroy": true
},
"route": [
{
"carrier_gateway_id": null,
"cidr_block": "0.0.0.0/0",
"destination_prefix_list_id": null,
"egress_only_gateway_id": null,
"gateway_id": "igw-064b0f7fc8c23c375",
"instance_id": null,
"ipv6_cidr_block": null,
"local_gateway_id": null,
"nat_gateway_id": null,
"network_interface_id": null,
"transit_gateway_id": null,
"vpc_endpoint_id": null,
"vpc_peering_connection_id": null
}
],
"tags": {
"Name": "crossplane-public-route-table-single-us-east-2",
"crossplane-kind": "routetable.ec2.aws.jet.crossplane.io",
"crossplane-name": "crossplane-public-route-table-single-us-east-2",
"crossplane-providerconfig": "default"
},
"vpc_id": "vpc-06c9890daaaefcc73"
}
}
},
"terraform": {
"required_providers": {
"aws": {
"source": "hashicorp/aws",
"version": "3.56.0"
}
}
}
}
terraform.tfstate
{
"version": 4,
"terraform_version": "1.0.5",
"serial": 5,
"lineage": "2c4571c9-a887-4b95-a86c-362b4f5ed2d7",
"outputs": {},
"resources": [
{
"mode": "managed",
"type": "aws_route_table",
"name": "crossplane-public-route-table-single-us-east-2",
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
"instances": [
{
"status": "tainted",
"schema_version": 0,
"attributes": {
"arn": "arn:aws:ec2:us-east-2:738718428379:route-table/rtb-00666219825d23a0e",
"id": "rtb-00666219825d23a0e",
"owner_id": "738718428379",
"propagating_vgws": [],
"route": [],
"tags": {
"Name": "crossplane-public-route-table-single-us-east-2",
"crossplane-kind": "routetable.ec2.aws.jet.crossplane.io",
"crossplane-name": "crossplane-public-route-table-single-us-east-2",
"crossplane-providerconfig": "default"
},
"tags_all": {
"Name": "crossplane-public-route-table-single-us-east-2",
"crossplane-kind": "routetable.ec2.aws.jet.crossplane.io",
"crossplane-name": "crossplane-public-route-table-single-us-east-2",
"crossplane-providerconfig": "default"
},
"vpc_id": "vpc-06c9890daaaefcc73"
},
"sensitive_attributes": [],
"private": "<REMOVED>"
}
]
}
]
}
For example crossplane-private-subnet-single-us-west-1 has the following when in /tmp/0e47f8e8-3314-48ab-8c22-ba655c8b376a
running terraform plan
->
Terraform will perform the following actions:
# aws_subnet.crossplane-private-subnet-single-us-west-1 will be created
+ resource "aws_subnet" "crossplane-private-subnet-single-us-west-1" {
+ arn = (known after apply)
+ assign_ipv6_address_on_creation = false
+ availability_zone = "us-west-1a"
+ availability_zone_id = (known after apply)
+ cidr_block = "192.168.1.0/24"
+ id = (known after apply)
+ ipv6_cidr_block_association_id = (known after apply)
+ map_public_ip_on_launch = false
+ owner_id = (known after apply)
+ tags = {
+ "Name" = "crossplane-private-subnet-single-us-west-1"
+ "crossplane-kind" = "subnet.ec2.aws.jet.crossplane.io"
+ "crossplane-name" = "crossplane-private-subnet-single-us-west-1"
+ "crossplane-providerconfig" = "default"
}
+ tags_all = {
+ "Name" = "crossplane-private-subnet-single-us-west-1"
+ "crossplane-kind" = "subnet.ec2.aws.jet.crossplane.io"
+ "crossplane-name" = "crossplane-private-subnet-single-us-west-1"
+ "crossplane-providerconfig" = "default"
}
+ vpc_id = "vpc-02e2b1959126ffbe6"
}
Plan: 1 to add, 0 to change, 0 to destroy.
main.tf.json
{
"provider": {
"aws": {
"access_key": "<REMOVED>",
"region": "us-west-1",
"secret_key": "<REMOVED>",
"token": ""
}
},
"resource": {
"aws_subnet": {
"crossplane-private-subnet-single-us-west-1": {
"availability_zone": "us-west-1a",
"cidr_block": "192.168.1.0/24",
"lifecycle": {
"prevent_destroy": true
},
"map_public_ip_on_launch": false,
"tags": {
"Name": "crossplane-private-subnet-single-us-west-1",
"crossplane-kind": "subnet.ec2.aws.jet.crossplane.io",
"crossplane-name": "crossplane-private-subnet-single-us-west-1",
"crossplane-providerconfig": "default"
},
"vpc_id": "vpc-02e2b1959126ffbe6"
}
}
},
"terraform": {
"required_providers": {
"aws": {
"source": "hashicorp/aws",
"version": "3.56.0"
}
}
}
}
terraform.tfstate
{
"version": 4,
"terraform_version": "1.0.5",
"serial": 5,
"lineage": "0e47f8e8-3314-48ab-8c22-ba655c8b376a",
"outputs": {},
"resources": []
}
terraform.tfstate.backup
{
"version": 4,
"terraform_version": "1.0.5",
"serial": 4,
"lineage": "0e47f8e8-3314-48ab-8c22-ba655c8b376a",
"outputs": {},
"resources": [
{
"mode": "managed",
"type": "aws_subnet",
"name": "crossplane-private-subnet-single-us-west-1",
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
"instances": [
{
"schema_version": 1,
"attributes": {
"arn": "arn:aws:ec2:us-west-1:738718428379:subnet/subnet-06248409cd5c3c002",
"assign_ipv6_address_on_creation": false,
"availability_zone": "us-west-1a",
"availability_zone_id": "usw1-az1",
"cidr_block": "192.168.1.0/24",
"customer_owned_ipv4_pool": "",
"id": "subnet-06248409cd5c3c002",
"ipv6_cidr_block": "",
"ipv6_cidr_block_association_id": "",
"map_customer_owned_ip_on_launch": false,
"map_public_ip_on_launch": false,
"outpost_arn": "",
"owner_id": "738718428379",
"tags": {
"Name": "crossplane-private-subnet-single-us-west-1",
"crossplane-kind": "subnet.ec2.aws.jet.crossplane.io",
"crossplane-name": "crossplane-private-subnet-single-us-west-1",
"crossplane-providerconfig": "default"
},
"tags_all": {
"Name": "crossplane-private-subnet-single-us-west-1",
"crossplane-kind": "subnet.ec2.aws.jet.crossplane.io",
"crossplane-name": "crossplane-private-subnet-single-us-west-1",
"crossplane-providerconfig": "default"
},
"timeouts": null,
"vpc_id": "vpc-02e2b1959126ffbe6"
},
"sensitive_attributes": [],
"private": "<REMOVED>"
}
]
}
]
}
Even weirder is the status of the object
message: "create failed: cannot apply: apply failed: error creating subnet: InvalidParameterValue:
Value (us-west-1a) for parameter availabilityZone is invalid. Subnets can currently
only be created in the following availability zones: us-east-2a, us-east-2b,
us-east-2c.\n\tstatus code: 400, request id: ac2f1e0e-5f96-4a2a-85f8-c7f94653d43d:
: File name: main.tf.json"
reason: ReconcileError
General logs: I put the provider into debug mode and bounced it before I began applying these objects. That clean log is attached to this comment. gh-issue.log
My conclusion: I can only reproduce this with multiple copies of resources. It also happens over varying time ranges. Sometimes things fall out of sync very quickly, other times it can be a couple hours. Best guess is some variable is shared when it shouldn't be causing some of the async reconciles to be fed incorrect values. How long that takes just depends on how lucky you are.
We See also resources ready true synced true and flapping between false and true :/ but no idea to debug
That flip between true/false seems to occur when the resource is duplicated in AWS. It recreates it and is happy until the next time it reconciles.
Additional evidence of improper sharing. With the above configuration a us-west-2 vpc was created in us-west-1.
Currently investigating the code base.
I have similar issue for fargate resource. I use following stack:
crossplane-1.9
NAME INSTALLED HEALTHY PACKAGE AGE
aws-jet-provider True True crossplane/provider-jet-aws:main 7d23h
I use composition: https://github.com/NatzkaLabsOpenSource/managed-kubernetes
All is created successfully including FargateProfile but XP reports following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning CannotResolveResourceReferences 40m (x9 over 43m) managed/eks.aws.jet.crossplane.io/v1alpha2, kind=fargateprofile cannot resolve references: mg.Spec.ForProvider.PodExecutionRoleArn: referenced field was empty (referenced resource may not yet be ready)
Warning CannotResolveResourceReferences 36m (x6 over 39m) managed/eks.aws.jet.crossplane.io/v1alpha2, kind=fargateprofile cannot resolve references: mg.Spec.ForProvider.SubnetIds: referenced field was empty (referenced resource may not yet be ready)
Warning CannotInitializeManagedResource 25m (x3 over 33m) managed/eks.aws.jet.crossplane.io/v1alpha2, kind=fargateprofile Operation cannot be fulfilled on fargateprofiles.eks.aws.jet.crossplane.io "xpjeteks": the object has been modified; please apply your changes to the latest version and try again
Warning CannotCreateExternalResource 3m23s (x20 over 25m) managed/eks.aws.jet.crossplane.io/v1alpha2, kind=fargateprofile (combined from similar events): cannot apply: apply failed: error creating EKS Fargate Profile (xpjeteks:xpjeteks): ResourceInUseException: A Fargate Profile already exists with this name in this cluster.
{
RespMetadata: {
StatusCode: 409,
RequestID: "4d8d3a43-8a4e-4eaf-a41d-49d212c61db9"
},
Message_: "A Fargate Profile already exists with this name in this cluster."
}: : File name: main.tf.json
I see that tfstate file is empty:
{
"version": 4,
"terraform_version": "1.0.5",
"serial": 2,
"lineage": "75e5074d-89ed-47dc-8dd3-cda95b75d0c3",
"outputs": {},
"resources": []
}
and backup file is correct:
{
"version": 4,
"terraform_version": "1.0.5",
"serial": 1,
"lineage": "75e5074d-89ed-47dc-8dd3-cda95b75d0c3",
"outputs": {},
"resources": [
{
"mode": "managed",
"type": "aws_eks_fargate_profile",
"name": "xpjeteks",
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
"instances": [
{
"schema_version": 0,
"attributes": {
"cluster_name": "xpjeteks",
"fargate_profile_name": "xpjeteks",
"id": "xpjeteks:xpjeteks",
"pod_execution_role_arn": "arn:aws:iam::0000000000:role/xpjeteks-fargateprofile",
"selector": [
{
"namespace": "default"
}
],
"subnet_ids": [
"subnet-abcdefghijklmnop001",
"subnet-abcdefghijklmnop002",
"subnet-abcdefghijklmnop003"
],
"tags": {
"crossplane-kind": "fargateprofile.eks.aws.jet.crossplane.io",
"crossplane-name": "xpjeteks",
"crossplane-providerconfig": "aws-jet-provider"
}
},
"sensitive_attributes": []
}
]
}
]
}
Resource status is always False/False:
NAME READY SYNCED EXTERNAL-NAME AGE
fargateprofile.eks.aws.jet.crossplane.io/xpjeteks False False xpjeteks 43m
@andrzej-natzka Trying to narrow this down. Is the aws-jet-provider the only Crossplane Provider in your cluster?
We running provider-aws, provider-jet-aws, provider-kubernetes, provider-helm, provider-gitlab, provider-zpa and provider-jet-pagerduty if it helps
@andrzej-natzka Trying to narrow this down. Is the aws-jet-provider the only Crossplane Provider in your cluster?
In my last example I had following providers:
NAME INSTALLED HEALTHY PACKAGE AGE
aws-jet-provider True True crossplane/provider-jet-aws:main 8d
aws-provider True True crossplane/provider-aws:v0.29.0 6d7h
helm-provider True True crossplane/provider-helm:v0.10.0 6h8m
kubernetes-provider True True crossplane/provider-kubernetes:v0.4.0 6h8m
However when I found the problem first time I had only AWS-Jet-classic installed
Hi,
We have got the same issue, and we think the problem is somewhere in terraform-provider-aws.
Observe() calls workspace.Refresh(), which in fact runs terraform apply -refresh-only -auto-approve -input=false -lock=false -json
here.
We've added some debug messages to Refresh()
, and before terrafrom apply
the resources
field in tfstate file is not empty, and after terraform apply
it - it is.
Our issue in terraform-provider-aws
: https://github.com/hashicorp/terraform-provider-aws/issues/26021
Has anyone been able to get this to reproduce when running the provider out of a container? With running make run
in this repository pointed at our cluster we have ~50 resources that have been in sync for days. As soon as we switched out to the container things started duplicating and falling out of sync. The Terraform version is the same on both (custom provider build and local machine with Terraform 1.2.6).
The container uses a NewSharedProvider
while make run
doesn't. The container is built with the TERRAFORM_NATIVE_PROVIDER_PATH
environment variable which enables the SharedProvider instead of distinct ones. The errors we have seem indicate that NewSharedProvider isn't goroutine safe and is sharing memory. As a work around you can apply the following ControllerConfig which will disable that sharing. That check happens within the main.go of the provider. With the provided controller config the provider resources are stable.
apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
name: aws-jet
spec:
args:
- --debug
- --terraform-native-provider-path
- ""
Edited to fix the example ControllerConfig
looks it is working in our setup all resources now True/True
Unfortunately its not working for me. I have still False/False for fargateprofile and messages:
Message_: "A Fargate Profile already exists with this name in this cluster."
}: : File name: main.tf.json
My controllerconfig:
apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
name: aws-jet-config
spec:
args:
- --debug
- --terraform-native-provider-path
- ""
---
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: aws-jet-provider
spec:
package: crossplane/provider-jet-aws:main
controllerConfigRef:
name: aws-jet-config
What happened?
interesting here is that terraform.tfstate looks empty in this case for securitygrouprule
the same we can see for fargateprofile:
How can we reproduce it?
What environment did it happen in?
Crossplane version:
https://github.com/hashicorp/terraform-provider-aws/issues/25965