Unleash / unleash-client-dotnet

Unleash client SDK for .NET
Apache License 2.0
81 stars 39 forks source link

Feature toggles are fetched in a chron manner instead of when a feature flag is actually needed #198

Closed EriksonBahr closed 8 months ago

EriksonBahr commented 8 months ago

Is your feature request related to a problem? Please describe. during the startup of the project this library queries for all the toggles and cache it instead of querying the toggles whenever an actual request that requires a feature flag arrives (regular DI flow)

Describe the solution you'd like This library should allow configure the IUnleash to only query the feature flags when actually needed (without a task querying for it)

Describe alternatives you've considered Writing a simplified client myself

Additional context I can pick it up myself if you provide high-level guidance so the PR won`t get rejected or heavily updated due to design misalignments.

daveleek commented 8 months ago

Hello @EriksonBahr, and thank you for reaching out. This is actually an intentional design decision implemented across all of our server side SDKs. As a server side SDK, validating a feature flag should be super fast and not have to wait for a network call per feature flag evaluation. To achieve this we fetch all feature flag configurations from Unleash and store them in a local cache. Then, when a feature flag IsEnabled is called, we evaluate the feature flag locally using the given context and parameters and return the result super fast. You can read more about our thoughts on that in this blog entry, which goes into greater detail about our architecture: https://www.getunleash.io/blog/our-unique-architecture

We also have what we call Proxy SDKs (though not for .NET at the moment) intended for use in frontends and other types of non-server run apps that shouldn't evaluate locally. These SDKs will request validations from Unleash/Proxy/Edge services and return a list of enabled feature flags. Perhaps this is more aligned with your needs? Link to our Javascript Browser SDK: https://docs.getunleash.io/reference/sdks/javascript-browser (You'll find links to the other Proxy SDKs in the tree right next to it)

EriksonBahr commented 8 months ago

I think this is a bit overkill to evaluate at startup in favor of "faster" SDK. Perhaps I did not make clear my point, so I'll open a PR so we can discuss further (if that suits you).

sighphyre commented 8 months ago

Hey @EriksonBahr, I'm also not sure I understand what you're trying to achieve here. This flow is intentional in the Unleash SDKs. From what I gather from your message this would move us towards a network call during evaluating feature toggles which is going to cause a blocking call. Did I misunderstand something here?

Are you trying to reduce network traffic here?

EriksonBahr commented 8 months ago

Reducing traffic and overhead is one of the reasons, but in fact there can be a hiatus between the last chron-based fetch and the actual state of the feature flags in the server.

Example: My server started and fetched the flags (I'm not using them just yet) and after this a new flag is enabled in the unleash server, afterwards a request comes in and my service won't have this flag because of this hiatus.

A specific situation in our CI/CD triggered me to check the library source code and come up with #199 solution: we start our backend in containers and there is a racing condition between the test start and the feed of the feature flags.

daveleek commented 8 months ago

Hello @EriksonBahr, lets see if I can address your concerns

Reducing traffic and overhead is one of the reasons

Unleash and the SDKs use ETags and 304 Not Modified header responses with empty bodies, meaning Unleash will return a 304 and no body if the toggle configuration has not changed since the previous fetch.

but in fact there can be a hiatus between the last chron-based fetch and the actual state of the feature flags in the server.

Our main concern is around having well functioning production code. The poll cycle here is in place so that we can read from a cache in the SDK and not have to resolve data from a network call. This is core to our architecture; moving this is an async, network bound call would effectively break a huge number of user's current flows. Production toggles don't tend to change that much, but typically resolving a toggle needs to be fast and resilient. The default poll duration is fairly low, 30 seconds for this SDK, which leaves very little time between a change is made in Unleash UI to the feature being available in the SDK for evaluation. The interval can be changed during configuration to suit your needs of course.

A specific situation in our CI/CD triggered me to check the library source code and come up with https://github.com/Unleash/unleash-client-dotnet/pull/199 solution: we start our backend in containers and there is a racing condition between the test start and the feed of the feature flags.

This can be a nuisance in test code, since you'd need the toggles to have resolved in order to test your system constraints, for that we provide Synchronous Startup, which will force your SDK to resolve toggles before releasing control back to the caller.

EriksonBahr commented 8 months ago

Thanks for all the inputs.

I understood the design decisions of the lib, and the default polling behavior in order to speed up actual flag checks (and its pros and cons).

Our e2e services are actually fetching the codes too fast and storing in its cache (since they all use this library), before the actual feature flag seed mechanism kick in, creating the hiatus I mentioned where the library says there are no enabled feature flags at all. Fetching during startup won't solve it.

Would you guys consider my pull request if I make it properly configurable? Or is it a hard no-go?

sighphyre commented 8 months ago

Our e2e services are actually fetching the codes too fast and storing in its cache, before the actual feature flag seed mechanism kick

Hmm, I'm not quite sure what you mean here? From what I'm reading its sounds like you have some end to end tests going on here and by the time you're trying to execute the test code, the feature toggles haven't resolved yet. Is that correct?

Would you guys consider my pull request if I make it properly configurable? Or is it a hard no-go?

Unfortunately, this is probably a hard no. This would be incorrect behaviour for every use case outside of the one we're discussing. We are very interested in helping you get past your problem, but we generally don't accept pull requests that are only useful for a single user.

EriksonBahr commented 8 months ago

Hmm, I'm not quite sure what you mean here? From what I'm reading its sounds like you have some end to end tests going on here and by the time you're trying to execute the test code, the feature toggles haven't resolved yet. Is that correct?

Not correct...

Unfortunately, this is probably a hard no. This would be incorrect behaviour for every use case outside of the one we're discussing. We are very interested in helping you get past your problem, but we generally don't accept pull requests that are only useful for a single user.

I understand, thank you for your time.

sighphyre commented 8 months ago

Apologies we couldn't be more helpful! If you can help us understand your problem at a later point we'd be more than willing to try help.