Closed stevendborrelli closed 1 month ago
Another example I have found in logs, you can see the reque-after
does not seem to be taking effect:
2024-09-12T11:29:43Z DEBUG provider-azure Successfully requested update of external resource {"controller": "managed/storage.azure.upbound.io/v1beta1, kind=account", "request": {"name":"uksouth-manager-dev-1-azurefile-csi-nfs"}, "uid": "56647f97-72a2-4e1f-ad7d-bdc7e2951e16", "version": "102331389", "external-name": "34a22711534795a757d4841c", "requeue-after": "2024-09-12T11:39:32Z"}
2024-09-12T11:29:44Z DEBUG provider-azure Successfully requested update of external resource {"controller": "managed/storage.azure.upbound.io/v1beta1, kind=account", "request": {"name":"uksouth-manager-dev-1-azurefile-csi-nfs"}, "uid": "56647f97-72a2-4e1f-ad7d-bdc7e2951e16", "version": "102331389", "external-name": "34a22711534795a757d4841c", "requeue-after": "2024-09-12T11:39:57Z"}
2024-09-12T11:29:46Z DEBUG provider-azure Successfully requested update of external resource {"controller": "managed/storage.azure.upbound.io/v1beta1, kind=account", "request": {"name":"uksouth-manager-dev-1-azurefile-csi-nfs"}, "uid": "56647f97-72a2-4e1f-ad7d-bdc7e2951e16", "version": "102331389", "external-name": "34a22711534795a757d4841c", "requeue-after": "2024-09-12T11:40:13Z"}
2024-09-12T11:29:48Z DEBUG provider-azure Successfully requested update of external resource {"controller": "managed/storage.azure.upbound.io/v1beta1, kind=account", "request": {"name":"uksouth-manager-dev-1-azurefile-csi-nfs"}, "uid": "56647f97-72a2-4e1f-ad7d-bdc7e2951e16", "version": "102331389", "external-name": "34a22711534795a757d4841c", "requeue-after": "2024-09-12T11:39:49Z"}
@stevendborrelli, Thanks for your report. Here are the takeaways followed by the details of my investigation:
Below output demonstrates the behavior in action. I omitted duplicate lines for brevity. In the real output, each line except the first one occurs twice, once for “Synced” and once for “Ready” status condition updates. And, if you see “Synced: True” at the beginning, don't be surprised. It's also because of crossplane-runtime not being async aware.
> k get key.v1beta1.keyvault.azure.upbound.io -w
NAME SYNCED READY EXTERNAL-NAME AGE
issue-439-key False False 3s
issue-439-key False False 56s
issue-439-key False False 58s
issue-439-key False False 59s
issue-439-key False False 60s
issue-439-key False False 61s
issue-439-key False False 62s
issue-439-key False False 64s
issue-439-key False False 66s
issue-439-key False False 68s
issue-439-key False False 72s
issue-439-key False False 78s
issue-439-key False False 90s
issue-439-key False False 112s
issue-439-key False False 2m34s
issue-439-key False False 3m35s
issue-439-key False False 4m35s
Even though the creation grace period is 30 seconds, it ends at 56 seconds in the output above. I haven't investigated the details of how it extends beyond 30 seconds, but given that reconciliation is not hard-real time and there are other variables, such as jitter, in effect, I'm not surprised. For reference, this block of the managed reconciler executes during this phase.
After the grace period, this block of the managed reconciler executes. As you can see, returned reconciliation result only has Requeue: true
, it doesn't contain RequeueAfter
. requeue-after
logs you see belongs to the phase before the first creation attempt finishes, and therefore doesn't take effect in async creation — again, another quirk of managed reconciler not being async aware.
After the grace period, at 56 seconds, we see that the durations between status condition updates increase. The intervals don't look like they would fit an exponential curve because of the lack of time resolution in the output. If you examine timestamped pod logs below, you'll see that the backoffs are exponential, starting with 5 ms and doubling. Backoffs in the order of milliseconds are difficult to observe above. When the backoff duration increases to the order of seconds, they are not integral numbers, but fractional like 1.28, 2.56, 5.12 seconds, which is also difficult to observe. Timestamps indicate that exponential backoff is working as expected.
Backoff duration is capped at 1 minute by crossplane-runtime, which is configured again in crossplane-runtime, which is called by the provider.
Let me know if you have further questions.
I'm going to go ahead and close this issue. @stevendborrelli if there's any further info that contradicts the explanation let us know.
What happened?
When trying to provision a resource in error on Azure (due to the issue fixed in https://github.com/crossplane/upjet/pull/435 we can't see the exact error), the provider constantly tries to create the resource.
With Azure provider 1.2.0, the requeue is increased up to 16m40s, but the object still reconciles a few times a minute:
With Azure provider 1.5.0, it requires at 0s but eventually settles in to checking on the error create every 30s.
How can we reproduce it?
Create any resource that has an error in provisioning: