Open dannycoates opened 9 years ago
@dannycoates, @kparlante - I'm thinking about a few things, it would be nice to make the correlation very explicit with minimal processing.
Imaging us, as the consumer, doing something like:
render: function () {
this.logScreen('signin');
var signInButtonText = able.choose('signInButtonText');
$('.signInButton').text(signInButtonText);
this.logScreenEvent('ab.signInButtonText.' + signInButtonText);
this._signInButtonText = signInButtonText;
...
},
...
onSignInButtonClick: function () {
this.logScreenEvent('ab.success.signInButtonText.' + signInButtonText);
}
The downside is it's kind of sloppy. The upside is able doesn't need to change.
I am wondering if we could do something like:
render: function () {
this.logScreen('signin');
var signInButtonText = able.choose('signInButtonText');
$('.signInButton').text(signInButtonText);
...
},
...
onSignInButtonClick: function () {
able.success('signInButtonText');
}
Then, when we call able.results()
, it would report something like:
{
"signInButtonText": {
"experiment": "myExperiment",
"chosen": "come on in",
"success": true
}
}
In this case, able.results
would only return results for variables for which able.choose
was called.
This increases the scope of able, but reduces the amount of work on the consumer.
@shane-tomlinson, @dannycoates: My gut is that whether or not an event fired is not going to be flexible enough to evaluate success/failure for many use cases. At first blush, I like Shane's proposal -- leaves the decision to the code and the api looks pretty simple/straightforward/easy to understand. That said, I can imagine cases where the success or failure of the experiment is determined by code/events that happen in a different context (e.g. the user eventually validates their email), or a set of events/conditions that is ugly/painful for the code to track (e.g. success if some events happened in a particular order without others happening). IIRC optimizely had a way to write an independent piece of code that could report results back on a particular experiment (e.g. query against some sql database of metrics). Once we have the services pipeline ingesting a wide variety of sources at mozilla, you could imagine heka filters that reported back to able -- the trick would be for that filter to know which experiment was run.
Shane's proposal sounds like a reasonable place to start -- I can imagine having multiple mechanisms to do the choice to outcome correlation.
@shane-tomlinson I think you're correct that able needs to be directly involved in the feedback. I'm not sure we are (err, I am) ready to talk about the "mechanism" yet, but there's a couple things about you're sketch that I'd like to consider.
Having the app make a call to Able, as you've got in the click handler, is probably the right way to report measurements. Able is already a dependency that we've opted into with choose
, so I don't see a reason we need to hide or offload the data collection part.
I see you've directly tied the event[1] to the choice for signInButtonText
, which I assume was intentional. This links two things I've been trying to keep separate so far, variables and events, in a way that tightly couples them. My "vision" :gags: so far has been that apps import variables and export events through Able and that experiments do the opposite, while the subject
is the common thread between them. Apps and experiments should be able to develop fairly independently. Linking them by name in the app doesn't necessarily break that (nothing prevents us from having a variable and an event named the same thing) but it conflates their separate purposes in a way that might be confusing to future work.
Another goal was that app changes that involve Able can be "left in" for future experiments. For example, able.choose('signInButtonText')
doesn't make any reference to a specific experiment so multiple experiments, simultaneously or over time, can change that variable without the app needing to change. I'd like to keep that same property with events. The sketch uses able.success('signInButtonText')
which seems to break that principle because it both assumes that click event means success to every possible experiment and that the 'signInButtonText' is always relevant to that event. Of course for this experiment it is, but I think the experiment alone should have the power to define those things. I think we can fix that with a very slight modification:
onSignInButtonClick: function () {
able.sendReport('signInClicked')
// able will correlate the subject and event to the proper experiment and choices
}
Now that I think about it, having a strawman to poke at makes it easier to think about the limitations and possibilities of analysis, so :beers: Overall I think your sketch is very close to the mechanism I want. If we can satisfy our analysis requirements with it I'll be very happy. @kparlante I've got another reply coming :)
[1] - events, measurements, stats, results... all synonyms in this context, we should pick one
Thanks @kparlante
My gut is that whether or not an event fired is not going to be flexible enough to evaluate success/failure for many use cases
I agree. My goal with this thread is to discover what data we need to collect/report so that some "other" system (or future Able) can do the anaylsis. Maybe events are the wrong word, but my idea in general is that as an experimenter I'm interested in measuring specific things at specific times, and from those measurements I can do my analysis :wave:
In this discussion so far, events combine both 'thing' and 'time', An experimenter can specify 'time' (event name) to collect, but is stuck with whatever 'thing' can be measured based on what data is tied to the event, they can't make arbitirary measurements. Limitation or feature?
Anyway, an experiment can collect data from any number of events and they will get tied to the subject
and variables
chosen by Able. So I imagine the data stream would look something like this:
{
event: 'signInClicked',
time: 1422303764406,
experiment: 'signInButtonTextMatters',
subjectId: '6feaa3f2fba05421da38003a6dba8f7a',
choices: {
signInButtonText: 'Come on in'
},
data: {
// maybe additional fields the event decides to report?
// For example:
// able.sendReport('signInClicked', { termsAccepted: app.termsAccepted })
termsAccepted: true
}
}
I can imagine cases where the success or failure of the experiment is determined by code/events that happen in a different context [...]
If both contexts use the same subject
Able should be able, heh :), to make the correlation in many cases. More complicated senarios may need some other help.
So, given data like above, can we do the analysis we need?
@dannycoates, @shane-tomlinson oh ic, yeah I think "event" terminology was confusing me. "able_event", "experiment_event"?
My "vision" :gags: so far has been that apps import variables and export events through Able and that experiments do the opposite, while the subject is the common thread between them. Apps and experiments should be able to develop fairly independently.
:+1: I like this. By subject
do you mean the experiment name or the subjectId
(presumably an identifier for the user, in this limited context).
The sketch uses able.success('signInButtonText') which seems to break that principle because it both assumes that click event means success to every possible experiment and that the 'signInButtonText' is always relevant to that event. Of course for this experiment it is, but I think the experiment alone should have the power to define those things. I think we can fix that with a very slight modification:
:+1: To the reasoning and the proposed modification.
data: { // maybe additional fields the event decides to report? // For example: // able.sendReport('signInClicked', { termsAccepted: app.termsAccepted }) termsAccepted: true }
Yes, seems like the ability to pass in additional information is useful.
So, given data like above, can we do the analysis we need?
Well, what's not clear to me from this scenario is what gets logged for the people who do not click on the button. Should the client code call able.report() to create an "event" when the user sees the button? Or do we presume the user has seen the button because able.choose() was called, and something gets logged for that?
By subject do you mean the experiment name or the subjectId
subjectId
, which would usually correlate 1-1 with userId
for authenticated sessions or sessionId
for unauthed sessions.
Should the client code call able.report() to create an "event" when the user sees the button? Or do we presume the user has seen the button because able.choose() was called, and something gets logged for that?
I think we definitely want able.choose
to emit an implicit event to record the choice. I think beyond that its probably up to the app devs and experimenters to figure out which events will work and if new ones should be added.
So,
what's not clear to me from this scenario is what gets logged for the people who do not click on the button
In this case we'd log the choice event and nothing else. Whether that's enough I don't know, if not the experiment could track another event to close the loop, 'pageUnload' maybe?
I imagine it will take some time to come up with good practices for designing experiments. After a few we'll probably be able to streamline some things to reduce boilerplate.
@dannycoates
I think we definitely want able.choose to emit an implicit event to record the choice. I think beyond that its probably up to the app devs and experimenters to figure out which events will work and if new ones should be added.
Agreed; that should work as long as the call to choice() was aligned with the user actually seeing the choice. One can imagine scenarios where that wasn't true, but presumably a separate event could be logged explicitly if necessary.
I imagine it will take some time to come up with good practices for designing experiments. After a few we'll probably be able to streamline some things to reduce boilerplate.
Agreed.
Anyhow, I like the overall direction. :+1:
I see you've directly tied the event[1] to the choice for signInButtonText, which I assume was intentional. This links two things I've been trying to keep separate so far, variables and events, in a way that tightly couples them. My "vision" :gags: so far has been that apps import variables and export events through Able and that experiments do the opposite, while the subject is the common thread between them.
I had to think, stop, think, stop, and then think some more about this. I think I see what you are trying to do - a combination of enable full-on multivariate testing, keep all logic related to experiments and defining their success/failure out of the consumer code, and the ability to leave experiment harness code in place to enable future experiments.
This is really powerful, and I can see the value for advanced testing. At the same time I'm very worried about complexity w.r.t. configuration and results for novices like myself trying to do a straight AB test with no other experiment interference.
For the common case, I'm trying really hard to convince myself the complexity is necessary, but I haven't been able to. Advanced events (events other than choice made
and success
) add a layer of indirection I'm not sold on the need for.
It seems like if one event affects multiple experiments (or variables), the same functionality can be achieved by reporting per-experiment/variable events.
So yes, the choice to directly tie the event to the choice for signInButtonText was intentional. Doing a straight forward AB test seems like one variable and one event can be intimately coupled. To me, this feels natural.
Another goal was that app changes that involve Able can be "left in" for future experiments.
For items that are frequently tested, yeah. For items that infrequently change, meh, seems like a smell similar to checking in commented out code. This is a bit orthogonal to how to do the correlation.
For example, able.choose('signInButtonText') doesn't make any reference to a specific experiment so multiple experiments, simultaneously or over time, can change that variable without the app needing to change. I'd like to keep that same property with events.
I can understand multiple experiments being able to define a value for the same variable - e.g., I imagine multiple experiments would be defined to test the best button text in 3 different languages. Its the events I'm not convinced of.
I think we definitely want able.choose to emit an implicit event to record the choice
I think this is the right way to go, I'm wondering how this will play out once results are gathered.
For an AB test we are comparing two or more variations of a single variable. If each variation is selected roughly an equal number of times, we can just count the total number of "success-like" events. No other events need to be counted.
Feature toggles where we want to count the % of people that make use the new feature is a bit different. We'll have to count the total number of "success" events and divide that by the total number of implicit events. That seems fine.
:beers:
Nearly 3 months later...
@dannycoates - I've come around to the separation of the variables and events, as you have outlined. Now I have a lot of questions.
Where we left off:
module.exports = {
name: 'signInButtonTextMatters',
hypothesis: 'The sign in button text affects signins',
startDate: '2015-01-01',
subjectAttributes: ['lang'],
independentVariables: ['signInButtonText'],
eligibilityFunction: function (subject) {
return /en-US/.test(subject.lang);
},
groupingFunction: function (subject) {
return {
signInButtonText: this.uniformChoice([
this.defaults.signInButtonText,
'Come on in'
]);
};
},
events: ['signInClicked']
};
{
event: 'signInClicked',
time: 1422303764406,
experiment: 'signInButtonTextMatters',
subjectId: '6feaa3f2fba05421da38003a6dba8f7a',
choices: {
signInButtonText: 'Come on in'
},
data: {
termsAccepted: true
}
}
Is choices
an object to to allow multiple choices to be reported by one experiment? Correlating multiple events to the same user requires joining events on subjectId. Querying by event or experiment are straight forward.
I was thinking about the event stream in reverse, where the experiment is reported at the top level, and a stream of events are attached to it.
{
experiment: 'signInButtonTextMatters',
subjectId: '6feaa3f2fba05421da38003a6dba8f7a',
choices: {
signInButtonText: 'Come on in'
},
events: [
{
event: 'choice',
time: 1422303754683
},
{
event: 'signInClicked',
time: 1422303764406,
data: {
termsAccepted: true
}
}
]
}
Organizing the results this way makes it easy to see the entire subject's event stream for a given experiment and say "In the signInButtonTextMatters experiment, X number of people saw the choice, Y number of people clicked the sign in button"
I suppose really, either format can be transformed into the other.
I think both formats have nice properties. The thing I like about the first one (the unbundled stream) is that each event can stand on its own so is more stream-like. The second is more compact, which would be better if we report events in bursts. Since they are equivalent we could do either depending on how we choose to transmit them. It seems like the bundled format fits how we're currently doing metrics.
I started sketching something up last week https://github.com/dannycoates/abatar/commit/66231e8e78a23eef5929cc4454d358fd1afdd6a8
I should have something usable this week that we can play around with.
Nearly 3 months later...
(I didn't read this whole thread...)
More months later we are thinking of removing able.report() from the content server and keep track of the experiment states ourselves. There is also a solution where instead of removing we will remove it from our DataDog tags and report able choose data as their own events, but I'm trying to find the value in that.
For A/B experiments to actually work we need a way to analyze the data, but before we can even do that we need a way to report the data.
Right now all that exists is
able.report()
which will give you some data for each experiment you're enrolled in; the experiment name and the independentVariables:values that were chosen for each subject (usually only one).That's useful for knowing how many subjects got each value, but its not enough to do anything. Somehow we need to link choices to events that are relevant to the experiment.
In the most naive way I think it would be nice to define which events my experiment is interested in tracking and then have the choices linked to those events.
Borrowing from Shane's example I'd like to add
events
So, whenever the
signInClicked
event fired, in addition to whatever it normally logs the "report" for that experiment would get logged so that we can correlate the choices with the event.This means the experiment author will need to know what events are available to track, just as they need to know the independentVariables.
Gluing Able to whatever generates and logs these events will be an Issue for another day, but I'm wondering if a simple list of events is enough or do we need something more powerful?
@kparlante @shane-tomlinson