Read feature flag values from an ETS table instead from a GenServer

sascha-wolf commented 1 year ago

Is your feature request related to a problem? Please describe.

Currently reading feature flag values always happens through a GenServer.call/3. From my understanding for a given SDK key it's a single process handles all read requests. This creates a natural bottleneck.

If for example a feature flag is accessed in a hot code path, potentially from thousands of processes every single second, then this will invariably overwhelm the process that handles the feature flag requests, slowing down the system as a whole.

Describe the solution you'd like

Instead relying on GenServer.call/3 to fetch feature flag values I suggest to use an ets table. In a nutshell an ets table can be created on start - e.g. by the ConfigCat.Client process - potentially with the read_concurrency option enabled (I'd assume that reading is more frequent than writing).

Then feature flags can be read from the ets table, eliminating the need for most message passing and as such the natural bottleneck.

Describe alternatives you've considered

/

Additional context

This approach combines neatly with the "auto" and "manual" polling modes. It does get a bit more complicated when considering the "lazy" mode.

A naive implementation of the "lazy" mode would check the ets table first and - if the value is missing - send a message to fetch the value. But in a high load scenario this can mean that thousands of messages pour in when a value expires.

To solve this problem I suggest what I call an "optimistic lazy" approach, not unlike the stale-while-revalidate HTTP Cache-Control behaviour. In addition to reading values from the ets table you'd keep track of the last access time for each key - potentially on another ets table this time with write_concurrency enabled.

The "optimistic lazy" cache refresh implementation would then look at which keys are about to expire and refresh those that were accessed very recently, whereas "very recently" would be a matter of configuration with a sensible default (e.g. 5 seconds).

kp-cat commented 1 year ago

Thanks for the suggestion @sascha-wolf, I'll discuss it with the dev team and get back to you.

kp-cat commented 1 year ago

Hey @sascha-wolf, Thanks for opening the issue. We discussed it with the team. We have not experienced performance issues, and the customers have not complained about such things either. Before adding a performance tweak, we always try to learn how that will affect performance vs complexity. Do you have any measurements about the performance aspect?"

Would you mind creating a PR for this modification so that we can compare the performance difference between the new and the old version?

github-actions[bot] commented 10 months ago

This issue is marked stale because it has no activity in the last 3 weeks. The issue will be closed in one week. Please remove the stale flag to keep it open.

github-actions[bot] commented 10 months ago

This issue was closed due to no activity.

configcat / elixir-sdk