crossplane / upjet

A code generation framework and runtime for Crossplane providers
Apache License 2.0
296 stars 86 forks source link

Investigate duplicate resource error from Jet AWS #220 #58

Closed muvaf closed 2 years ago

muvaf commented 2 years ago

What happened?

A few users have reported this bug that seems to be a fundamental issue with Terrajet: https://github.com/crossplane-contrib/provider-jet-aws/issues/220

We should check whether Upjet-based official providers are subject to this. And I think we should fix in both Terrajet and Upjet.

How can we reproduce it?

Details in https://github.com/crossplane-contrib/provider-jet-aws/issues/220

sergenyalcin commented 2 years ago

I investigated this issue. Let me share my observations.

I started to investigate by looking the shared grpc server side because of the following comment: https://github.com/crossplane-contrib/provider-jet-aws/issues/220#issuecomment-1213269102

This comment suggests a workaround that is disabling the shared grpc server usage. And some watcher of this issue said that, this workaround is working. But after a few days, I saw the following comment: https://github.com/crossplane-contrib/provider-jet-aws/issues/220#issuecomment-1219534554

He says that, this workaround is not working for me. So this caused that reviewing my starting point of investigation.

To be honest, my first sense about this issue was biased because of first comment. If I start to investigation (as soon as possible) objective, I would say that, the root cause of this issue is related with the Sync/Async usage. Because I observed this type of errors in resources that are Sync and spent a lot of time for creation and deletion or when I loaded to cluster a lot and the provisioning/deletion takes a lot of time. And also when we change this to Async the problem was gone.

In the root issue, there are two resources: FargateProfile(eks) and SecurityGroup (ec2). For both of them we are using Sync functionality.

A short time ago, I worked on testing of eks resources I observed the same issue for FargateProfile and I resolved this problem by changing the behavior from Sync to Async for this resource. (For a long time, in my local environment the shared grpc server usage is enabled.)

So I think, we can resolve this type of problems by changing the behavior from Sync to Async

I will also open a PR for SecurityGroup resource to address this problem for this resource.