Add some message history processing capabilities

zachll commented 9 months ago

Esteemer currently doesn't process message history.

Proposal:

If the current candidate message template matches the message sent last month, set the message recency count to 1 (month). Apply this factor to the score.

If a message template was selected and delivered via email the previous month, Esteemer should score it significantly lower than other templates, so that it is only selected if no other message templates are available.

The downvoting should happen without regard to differences in measures, but if it is for the same measure, it should be downvoted more.

Don't do this now, but down the road we need to consider exceptions to this downvoting, for cases where continued improvement is motivating and receiving the same message about the same measure is a good thing.

mackgalante commented 9 months ago

Point 1

How are we recording message history into the input messages? Is that being done by the consortium when they generate input_messages or do we need infrastructure to implement this? (See issue #83 )

Point 2

If a message template was selected and delivered via email the previous month, Esteemer should score it significantly lower than other templates, so that it is only selected if no other message templates are available.

This likely requires both manipulations to the weight in the MPM as well as manipulations to the raw integers that we will be generating from message and measure recency. Setting the coefficient to -5, -10, -30, any sufficiently large negative integer might be needed, but we can tell for sure in testing. We can manipulate Message Recency with the following math so it starts large when the message is most recent, and decreases asymptotically so the 'downvote' effect decreases as time gets bigger. Bear with me here.

Recall history portion of the algorithm looks something like: data component + (X_t)(Message Recency) + (X_m)(Measure Recency) + (X_n)(Number Received<?>)

We can manipulate Message recency (below notated as X) like this: Message Recency term = X_t (e^(-X)) (X+1)^(-1))

Why is this so complicated now? Because without the second (x+1) term in a denominator when X=0 we would have the largest 'downvoting'.

Examples

Now we can evaluate this for the interval X = `Message Recency' = [0, ..., 4] to see how this impacts the overall rank:

Message recency = 0
- Rank = ... + (-10)([e^(-0)] (0+1)^(-1)) + ...
- Rank = ... + 0 + ...
Message recency = 1
- Rank = ... + (-10)([e^(-1)] (1+1)^(-1)) + ...
- Rank = ... + -1.84 + ...
Message recency = 2
- Rank = ... + (-10)([e^(-0)] ()+1)^(-1)) + ...
- Rank = ... + -0.45 + ...
Message recency = 3
- Rank = ... + (-10)([e^(-0)] ()+1)^(-1)) + ...
- Rank = ... + -0.11 + ...
Message recency = 4
- Rank = ... + (-10)([e^(-0)] ()+1)^(-1)) + ...
- Rank = ... + -0.01 + ...

Mull it over, I know it's tough to look at, but the implications are good! As message recency decreases, so does the overall rank, and with no recency the term is entirely ignored. We would only be evaluating this term once per candidate as well, so it's computationally efficient. We would likely want to use testing to determine a reasonable X_t value, depending on the median expected rank that a selected candidate will evaluate to. Maybe a change of -0.1 will move the needle, or maybe we need X_t to be -100 to make sure the needle is moving appropriately in the overall rank.

Point 3

The downvoting should happen without regard to differences in measures, but if it is for the same measure, it should be downvoted more.

We can use the same transform above on measure recency to get a similar effect, where more recent feedback on the same measure causes stronger decreases in the overall ranking. Because we have the measure and message recency terms separated, they run independently - it will downvote similar message templates, and do independent downvoting of repeated measures as well. Therefore, message and measure repetition will cause 'double' downvoting.

mackgalante commented 8 months ago

Objectives: 1) Pull in requisite data for calculations:

the current_feedback_month (month for which feedback is being generated), assign as time 0
Acceptable Candidates' measure(s)
Acceptable Candidates' message template name(s) 2) Make empty dicts for storing message and measure recency values

3) Compare row 'month' to t0, convert to integers representing distance from t0

Ex: t0 = Dec, Nov therefore t(-1), Oct = t(-2), etc 4) Calculate message recency
if acceptable_candidate['message_template_name'] matches a message_template_name value in matrix, calculate how many months it has been since that match
ex, if month is december now and candidate template is 'not top performer', calc months since last 'not top performer' noted in history 5) Calculate measure recency
- if acceptable_candidate['measure'] matches measure in matrix, calculate months since that match

mackgalante commented 8 months ago

Per team meetings, will be resolving this issue, opening new issue to describe remaining work for getting history processing online.

Display-Lab / precision-feedback-pipeline