Azure / azure-service-operator

Azure Service Operator allows you to create Azure resources using kubectl
https://azure.github.io/azure-service-operator/
MIT License
716 stars 194 forks source link

Bug: When `AZURE_SYNC_PERIOD` is defined as empty, no periodic resync is performed #3965

Closed nojnhuh closed 3 months ago

nojnhuh commented 4 months ago

Version of Azure Service Operator

v2.6.0

Describe the bug Per the documentation, when AZURE_SYNC_PERIOD is defined but empty in the aso-controller-settings Secret, the default of "1h" should be in effect. In practice, I observed no periodic resyncs within 90 minutes or so.

https://github.com/Azure/azure-service-operator/blob/9e2b792cc5fcfbbaa967330f17797c2284a9a33d/docs/hugo/content/guide/aso-controller-settings-options.md#L65-L71

Tracing where the config is consumed here: https://github.com/Azure/azure-service-operator/blob/9e2b792cc5fcfbbaa967330f17797c2284a9a33d/v2/internal/config/vars.go#L232-L236

To where it is used here: https://github.com/Azure/azure-service-operator/blob/9e2b792cc5fcfbbaa967330f17797c2284a9a33d/v2/internal/util/interval/interval_calculator.go#L156-L158

It seems to me like a defined empty string value will result in no resyncs.

To Reproduce Steps to reproduce the behavior:

The particular use case I was testing when I ran into this was a ManagedClustersAgentPool with autoscaling enabled. Autoscaling events affecting the count of the node pool were never being reflected in the status.count of the ASO resource.

Expected behavior Both the documentation and actual logic are aligned as to the behavior of the empty string. I'd slightly prefer the implementation be updated to match the docs than the other way around, such that an empty string implies the 1h default. Either way is acceptable to me though.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

matthchr commented 4 months ago

Agree that this seems like a bug, although reading through the code it does seem like setting it to empty string to mean "no re-sync" may be intended:

https://github.com/Azure/azure-service-operator/blob/9e2b792cc5fcfbbaa967330f17797c2284a9a33d/v2/internal/config/vars.go#L53-L64

Specifically the last bit.

So this actually looks like a documentation bug to me, more than a behavior bug. Do you think that omitted (== default) being different than specified but empty is awkward/difficult to work with in your context? Can you expand more on the whys of that, if so?

matthchr commented 4 months ago

Maybe it would be better if we had 0s mean "do not re-sync", rather than drawing a distinction between omitted and set to empty string. WDYT @theunrepentantgeek and @super-harsh?

nojnhuh commented 4 months ago

It's only easier in CAPZ for empty string to mean "default" because we use envsubst to inject it like this:

stringData:
  AZURE_SYNC_PERIOD: ${AZURE_SYNC_PERIOD:=""}

I haven't found a clean way for us to omit the key entirely while still making it configurable for our users. That's definitely more of a CAPZ problem than an ASO problem though. I'd say to do whatever makes the most sense for ASO, just offered that preference in case there wasn't any other reason to prefer one over the other.

theunrepentantgeek commented 4 months ago

I don't like the idea of 0s meaning never - because 0s looks like a duration, but it's not. What about using a different sentinel value in the config, like never? We can turn it into something else internally.

matthchr commented 4 months ago

I don't like the idea of 0s meaning never - because 0s looks like a duration, but it's not. What about using a different sentinel value in the config, like never? We can turn it into something else internally.

We could do a negative duration (which is pretty common for indicating never), but I also don't love that. It might be better than customizing parsing though and/or treating omitted as different than specified-but-empty.