Open hasheddan opened 3 years ago
I'd prefer to start simple here, at least until we have more confidence that folks would benefit from different poll intervals for different individual managed resources (or kinds of managed resources). I personally suspect adding a --poll-interval
provider flag to each provider that sets WithPollInterval
for all of the provider's controllers would be a huge win that requires little more than a touch of plumbing.
If we did want to expose this at the ProviderConfig
level (which does seem like the logical next most flexible place to do this after the provider level), or even the managed resource level, I'd prefer to do that by updating the the return value of the ExternalConnector
, which is typically what first reads the managed resource and its provider config. For example:
// An ExternalConnection represents a connection to an external system
type ExternalConnection struct {
Client ExternalClient
PollInterval time.Duration
// Using a struct future proofs us in case we want to surface
// other things up to the managed.Reconciler in future.
}
// An ExternalConnecter produces a new ExternalClient given the supplied
// Managed resource.
type ExternalConnecter interface {
// Connect to the provider specified by the supplied managed resource.
Connect(ctx context.Context, mg resource.Managed) (ExternalConnection, error)
}
@negz putting it in ExternalConnection
seems good to me :+1: I would advocate for going ahead and doing that as I think a fair number of community members have already asked for this functionality at the more granular resource level, but I'll surface this to community and ask them to show support here if this functionality is desired.
We have a lot of different tools which are making API calls to poll our infrastructure and it is often a challenge to configure the api calls to prevent rate limiting. The ability to change the polling interval at the global level would be extremely helpful. (We would probably default to 5m.) It would be useful to change the interval at the service level. For example RDS instances for us are fairly static once created - there's no need for us to poll them every minute. Either way though, whichever method is provided, this would be great and would be a help. The one thing we would like, though, is to be able to set the polling limit using CRON instead of timer based. Ex. */10 * * * *
would give us a ton of flexibility and control over when the api requests are made.
In our use, we haven't come across the need to manage the poll interval at the resource level. Seems like controller-setup ends up being the place where the most knowledge of provider rate limiting can be applied. Provider flag or ProviderConfig both seem like good options for further customization outside controller setup.
Yep in cases where I've thought this would be useful it's at the ProviderConfig
or Provider
level, never at the individual resource level itself.
I like the ExternalConnection
approach as it would work for both of the use-cases I can think of - keeping within known API rate limits, and speeding up reconciliation of resources referencing other resources, but still configurable by the provider.
ProviderConfig level implementation seems like a great start and is something we really need. Obviously, the ability to override it at the managed resource level as the RDS instance example above would be also useful.
We were seeing some excessive calls being made and provider-level rate liming would be a great start to tackle this 👍
+1 to ProviderConfig
with a default that can be controllable via flag given in Provider
(slight preference) or ControllerConfig
.
We also have issues with the default poll interval (or rather shortWait
and longWait
). Since we have more than 2000 composites and managed resources a reconcile run of all of them takes longer than the configured longWait
(usually 1 minute). This means that the controller is constantly reconciling and hitting its rate limit. This massively delays reconcilation of new/changed resources (up to 6 minutes in some cases).
Also see https://github.com/crossplane/crossplane/issues/2256
So I'd definitely welcome the possibility to configure the provider's poll interval. Additionally it might make sense to also configure it for the core Crossplane reconcilers (especially the composite reconciler).
As commented here: https://github.com/crossplane-contrib/provider-sql/pull/36#issuecomment-817538531
I don't really see the difference between the poll interval and the already existing --sync
(SyncPeriod
)?
@srueg It does seem like there is a subtle difference here. Specifically:
SyncPeriod
seems to work by setting the ResyncPeriod
of the controller-runtime's cache. The cache is a bunch of informers that watch any type the client is interested in, with a re-list every ResyncPeriod
to make sure the watches didn't miss anything. So one difference is that setting SyncPeriod
to something like 60 seconds would cause the controller-runtime client to re-list every type it was concerned with every 60 seconds.SyncPeriod
doesn't easily support (I think we'd need to create a different client/cache for each controller to do so, rather than using the client returned by mgr.GetClient()
).As @hasheddan pointed out to me, controller-runtime recommends the approach we take with PollInterval
for our use case in which we want to poll things we cannot watch (i.e. external APIs) per https://github.com/kubernetes-sigs/controller-runtime/blob/a1e2ea2b17c3f64b1a060234863282cb330473c6/pkg/manager/manager.go#L127
That all said, I do think the relationship between PollInterval
and SyncPeriod
is confusing, especially because we typically expose the latter as a flag and not the former. I certainly did a double take thinking we'd accidentally reinvented a wheel when you asked this question. :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
What problem are you facing?
Currently, all controllers that use the managed reconciler have a default poll interval of 1 minute, which can be overridden during controller setup. However, this means that all objects reconciled by that controller are forced to use the same interval, which may not make sense (e.g. a production database may need to be checked for drift more frequently then a staging one).
How could Crossplane help solve your problem?
Crossplane users should be able to define the happy path poll interval for all of their resources. This could be accomplished by exposing
pollInterval
field embedded in every managed resource, then getting the value when returning withRequeueAfter
in the managed reconciler. Another option would be to expose the field on aProviderConfig
, which @muvaf has pointed out could make sense due to the fact that rate limiting behavior is typically scoped to a single account on providers, such as AWS.A third option would be to just define an additional method on the
ExternalClient
interface, potentially something likeGetPollInterval()
that would allow provider authors to determine how the poll interval is chosen. It could be a field on theProviderConfig
or managed resource, provided as an argument to controller entrypoint, or some combination of the three with different options taking precedence over others.To encourage consistency, it would be good for us to suggest a common implementation, likely by adding
pollInterval
fields to the embeddedProviderConfig
and managed resource structs, as well as supplying a defaultGetPollInterval()
that accepted both objects (and potentially a hardcoded default fallback) and then selected one at a chose precedence, while also allowing provider authors to override this behavior.