flippercloud / flipper

🐬 Beautiful, performant feature flags for Ruby.
https://www.flippercloud.io/docs
MIT License
3.71k stars 414 forks source link

Preloading should be off by default #877

Closed hashwin closed 2 months ago

hashwin commented 2 months ago

We recently had a major production incident due to Flipper preloading. I think it's not a good design decision to turn this on by default because it can lead to a lot of overhead in every request. In our case it was adding 100ms to all requests, causing the entire app to slow down significantly and also end up in a state where requests totally time out, mainly due to the fact that we receive a lot of webhooks which are normally 5ms requests.

When we first introduced Flipper we didn't have a lot of features / actors, so we didn't see a difference in latency. But over time, we added features and individual actors that quietly and gradually increased this latency, and it was not at all apparent that this was related to flipper. We were only able to narrow it down after finding high redis usage and calls to "get_all" within flipper middleware.

I think it is an important feature to highlight in the front page of your documentation, turn if off by default, and let individual devs decide whether to turn it on or not. If it can be so dangerous and add so much overhead out of the box, it should definitely not be on by default. This is more true for people migrating from other feature gems who already have a lot of features / actors, and also those who are using Redis, because preloading is almost useless in that case, fetching keys from Redis is already quite quick.

jnunemaker commented 2 months ago

@hashwin well first I'm sorry to hear about the production issue. That sucks. The goal of flipper is to never have them happen.

Fun fact: preloading was added to prevent this type of problem and its worked great for most people. With preloading off, people would end up with 10's to 100's of queries per page (depending on how many features they had) and that caused a lot of latency.

I'm uncomfortable changing the default. I am up for better documentation. Do you have any suggestions as to where in the docs it might make sense and best be seen to discuss the trade offs of preloading?

We currently have this section on the Optimizations page. Perhaps we link to that from a few other spots? Any suggestions?

Closing to keep things tidy here but would love to keep the conversation going.