How to update experiment configuration without code deploy?

omnilinguist commented 8 years ago

To whom it may concern,

I found this project while investigating A/B testing frameworks, and while it seems to provide a great deal of the functionality which I am looking for, there is one big design question that seems to remain unanswered upon looking at the documentation and skimming through part of the implementation (actually two, but they are related):

It doesn't seem to be possible to be able to update the configuration of an experiment without modifying the code. For example, suppose you want to expose 10% of the population to some treatment, and then you want to increase that percentage to 25% for example. This seems to not be doable without updating the actual code. While this could be made to work reasonably well in environments that have continuous deployment or something close to it, perhaps in some organizations the deployment architecture / policies may not be able to easily accommodate this, slowing down the iteration process on experiments (and having to basically deploy a "hotfix" if some experiment needs to be rolled back).
- As an added bonus, I would also like to have all of the original 10% that was seeing the treatment before to be part of the 25% in the updated experiment. This probably warrants a top-level bullet point in itself actually, though there might be ways to accomplish this just using the PlanOut language.
Related to the above is the fact that Planout doesn't seem to be storing anything in any sort of data store (such as a database). Is this intentional, and if so, why? For example, the above problem regarding modifying experiment parameters (a form of versioning) could be solved if the experiment configuration were being stored somewhere outside of the actual application code. As of now, it seems that PlanOut is able to achieve a lot of the standard experimentation functionality (whitelisting via overrides, targeting, and namespacing) all without any sort of persistent storage, which I did not think was possible!
- One consequence of not using any central datastore is that all salts must be stored locally on every instance that is running the code rather than consulting a central master server for getting experiment information (as well as the actual computation of assignment treatments). I am surprised that this does not cause any problems in a massive service-oriented architecture environment where all the experiment-related functionality is decentralized throughout all the servers?

Wondering how Facebook or other users get around these apparent limitations (maybe some more that I will think of later), but apart from these much of the rest of the system looks fairly clean yet sophisticated!

eytan commented 8 years ago

Hi @omnilinguist! In most production environments we expect users to store serialized PlanOut language code (https://facebook.github.io/planout/blog/planout-language.html) and namespaces in some kind of database. To roll out a treatment to broader populations one simply allocates more segments of the namespace to that experiment (namespaces should also live in a db). If I am understanding your bonus question correctly this is trivially supported by universes. In general experimenters should be able to just instrument their code once to grab parameters from a namespace, and then no subsequent code changes to the native code base need to be made for any type of follow on experiment (assuming your future experiments don't require you to change any application logic).

If you haven't already you might want to also check out the PlanOut paper, which discusses management in more detail.

Re: databases -- we tried to leave it up to the developer to decide how you want to store stuff. The documentation and base APIs are written in a way to make it easy to get started but if you dig into the source code for the reference implementation you can see notes on how you'd want to store and cache things. In general most assignment procedures can and should be done in an online fashion (because it is faster, more reliable, and more reproducible), so that the only thing you need to store are the serialized experiments, segment->experiment mappings for namespaces, and some metadata. In some cases it is valuable to be able to query external services (eg to get information for gating, clusters to be used in randomization, or retrieve contexts for use with a policy), and we just introduced an API for adding those external services that we might write about in a blog post soon :)

omnilinguist commented 8 years ago

@eytan, here's my next iteration of comments, in roughly the order I care about them (I am trying to process the framework as quickly as possible, but am just getting started so hopefully bear with me a bit):

(This is probably the most significant point here) In a service-oriented architecture, how do you envision dynamic experiments to be supported? To better illustrate my point, suppose I want to implement the interface getTreatment("my_experiment", unit1, unit2, unit3...) for clients (or even to wrap the fetching of the auxiliary arguments in a service), and that the experiment->planoutJson mappings are stored in a db table as you mentioned. If there is no centralized service that wraps the experiment functionality, then every service will have to hit the db directly (perhaps with some caching as described below) and do all the experiment processing locally. In case some of these requests end up needing to make sideways calls to other services or datastores to get the additional parameters to pass into assign(), then this will all be decentralized.
- Regardless of whether the processing is centralized or decentralized, in a scenario where the experiment changes could dynamically change based on what is in the db, then wouldn't the getTreatment(...) code have to somehow dynamically create not only the subclasses of SimpleExperiment but also the actual instances of them, and then call get() on them? Ideally all this should be able to be hidden from the consumer so that all they need to do is call getTreatment() with a given experiment name + additional arguments and return 1 or more parameters (maybe in JSON format). The way you responded suggested that this is doable somehow using the interpreter, so would it make sense to just wrap the interpreter in some interface (possibly in its own service) that abstracts all the experiment table db access and experiment processing from the clients?
- The way this Interpreter class is designed (specifically the fact that its constructor takes a serialization argument) naturally suggests caching a single Interpreter instance in memory per experiment configuration (as specified via planout JSON), so that if an experiment is dynamically updated then we just need to make another Interpeter instance with the updated JSON, and otherwise we can just re-use the same instance and avoid the overhead of instance construction; does that sound right? Also, would you happen to know from experience whether this setup would be performant at very high traffic.
It seems that the Planout language is intended to be able to support both whitelisting and targeting?
- Whitelisting: suppose that I want to configure some specific unit(s) to always see a particular treatment for a particular parameter, then I think the way to do this would be to have put special if clauses in the planout script to explicitly override the treatment(s) given to a particular list of unit(s)?
- What you've called "gating" (and I called "targeting" above): suppose that I want to have additional parameters for the primary unit, for example let's say age and gender, and assign different treatment patterns depending on the age/gender. In this case, I assume that the planout language DSL can just incorporate if clauses in the right places to switch between parameters getting assigned to different operators?
- If the above is already supported, what do you mean by support for ability to query external services as mentioned above? (This is related to the first point above regarding SOA).
- If it is ready, when do you expect that this functionality will be pushed to master and a release deployed to pip?
Regarding the treatment-preserving rollout procedure I mentioned above, it seems I was somewhat misunderstood, but I just realised that this can probably be trivially supported by some not-so-clever extension of the WeightedChoice operator. For example, let's say I have treaments "a", "b", "c" with initial weights of [0.1, 0.1, 0.8], and I want to ideally extend this to [0.3, 0.3, 0.4]. However this will result in the people that initially saw "b" now see "a", presumably because of the way that WeightedChoice maps the configuration to the hashes (the first 0.3 would capture the groups initially seeing "a", "b", as well as the first 1/8 of the people initially seeing "c"). But, I think we can kind of hack around this by just setting the choices to ["a", "b", "a", "b", "c"] with weights [0.1, 0.1, 0.2, 0.2, 0.4], and here the people that originally saw "b" would continue to see "b" (let's say that "c" is a special case where it is ok to have them see different treatments over time). Does this seem right to you? The only major concern here might be whether these separate segments with the same choices might cause any problems (doesn't seem like it should since in the end the treatment should end up being the same, but just in case).
Lastly, I am assuming that if we do indeed store these additional tables, then we are responsible for building the interfaces to edit those tables if we would like users to be able to modify them more easily, as obviously since you've delegated the specific storage of experiments to the end user, it only makes sense to also delegate this part.

I expect to possibly have more questions about Namespaces later on, but that is not part of the initial version of what I am building. In any case, I am also trying to figure out some of this stuff as I go, but some expert pointers may be useful :)

facebookarchive / planout

How to update experiment configuration without code deploy? #111