Analyticsphere / metricsReportsRequests

Used to provide issue tracking for changes and additions to the Connect Metrics reporting.
MIT License
0 stars 0 forks source link

Metrics evaluating email success rates for BL survey completion #111

Closed brotzmanmj closed 6 months ago

brotzmanmj commented 8 months ago

We would like to know how effective our baseline survey email reminders are in getting participants to complete their baseline survey.

We are referring only to Survey Modules 1-4.

Emails will be those in the Notification Table that are Baseline Survey emails, contact attempts 1-6. Ignore contact attempts 7-10. We do not know if someone opened or clicked through the email as of yet, so we will assume that if the email was in Notification Table that is will count as sent and received.

We propose to analyze this among participants who are Recruitment Type= Active, Verification Status = Verified, with User Profile submitted between 1/1/23 and 10/1/23.

The essence of what we want to know is who completes their baseline survey: Before the first contact attempt Between contact attempts 1 and 2 Between contact attempts 2 and 3 etc, 3 and 4, 4 and 5, 5 and 6, and after 6. The denominator will change as you go, since the reminders stop after the survey is completed.

The analysis should first include all sites, and then stratified for KP participants and each of the other 5 sites.

In later analyses, we may look to differentiate among modules, and look at started vs submitted status, but for the first round, just look at this binary: all 4 modules submitted vs any of the 4 modules not submitted.

brotzmanmj commented 8 months ago

Attaching summary of the request from Hana, noting that this is just a draft and still needs review and comment from the team before being implemented. https://nih.app.box.com/file/1411703979654

KELSEYDOWLING7 commented 8 months ago

@brotzmanmj @rebexxxxxx Please see the updated report below, pending Analytics review. My personal vote would be to remove either table 3 or table 4, as well as table 5 , for redundancy. Baseline-Module-Completion-vs-Emails.pdf

@FrogGirl1123 when I searched in prod, there is only one person who is active and verified who completed Module 1 before they completed their user profile. It may get bit complicated to get the denominator if I change the way I'm pulling the data from BQ (where anything like 'BL Survey Reminder' is in the notification category) , so my preference would just be that this person wouldn't be included. Please let me know what you think.

This would mean we don't have a row of "Before the first contact attempt", so I have set the first row as "Completed Before Contact 2"

brotzmanmj commented 8 months ago

Hi @KELSEYDOWLING7 that first group should be completed module 1 before verification. It should not be possible to complete module 1 before submitting user profile.

brotzmanmj commented 8 months ago

Looking at Table 2, something about it isn't making sense to me. The last column, is that the number of people receiving the reminder? Meaning, if there were 10932 people and 5179 of them completed the BL survey before contact attempt number 2 then the percent completing the survey after just 1 contact attempt is 47.4%. Then only 5753 people get reminder 2, so the next row should have that as the denominator. Does that make sense?

brotzmanmj commented 8 months ago

Also, the label for the first row says 'completed before contact 2' but can you explain what that number 5690 means? Since the row includes columns for partial completion and no completion, I'm not sure what the 5690 represents. thanks.

HanaShiho commented 8 months ago

Isn't 5690 the number of participants who received the second reminder (i.e. those who didn't complete the entire BL survey after receiving the first reminder)?

I agree that Table 4 is redundant; people who haven't started are at 0 and people who have completed things are at 4. The partial completers are already stratified more accurately in Table 3. Table 5 is also almost identical to Table 2.

KELSEYDOWLING7 commented 8 months ago

@brotzmanmj Oh, I thought on the call we had discussed using only the user profile dates because active participants could finish all modules before being verified in theory. But there are 3,839 active participants with the user profile completion time matching that have completed at least Module 1 before verification so I'll adjust.

And for the 'completed before contact" question, it was requested that the row names be adjusted to that but I should have clarified. What I'm doing there is looking at their most recent attempt. So the participants in the 'completed before contact 2" row have a max attempt of 1, 'completed before contact 3" row have a max attempt of 2, ect.

brotzmanmj commented 8 months ago

Hi @KELSEYDOWLING7 the user profile submission dates define the population that will be included in the analysis but that's all. The first group for analysis should be any one who completed the entire baseline survey before 1st survey reminder message was sent.

Question: Table 1 shows 1012 with no surveys completed; Table 3 shows 1626 with zero surveys completed. Which is correct?

I still don't understand the concept of Table 2 but will defer to Nicole's review to guide the next round of reconfiguration of it.

FrogGirl1123 commented 8 months ago

@KELSEYDOWLING7, @HanaShiho, @rebexxxxxx and @brotzmanmj - the analytics group is meeting Tuesday morning to review the request and the example tables Kelsey's made so far and discuss an analytic plan. Let's put a pin in this exchange until after that meeting.

brotzmanmj commented 7 months ago

Hi @KELSEYDOWLING7 @HanaShiho @rebexxxxxx @FrogGirl1123 With Nicole's OK, we redrafted the table templates for this request based on a similar report that HP provided recently for another task that was easy to follow and consume. Please see revised tables here: https://nih.app.box.com/file/1418609517448 Let's start with this and have this replace the previous request to avoid confusion. After we review this new report, we may have some additions. Please let me know if any questions.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj Good morning, I'll get started on the updated request. In case it's of any use, please see the report from the previous Analytics discussion on Tuesday. I believe the Attrition table and Table 1 suffice for the first table. Baseline-Module-Completion-vs-Emails.pdf

The second table gets very tricky to generate as far as identifying individual people from the overall pool of contacts, the time frame of every module they've completed, and the timing of every contact we've sent. This will take me a bit.

brotzmanmj commented 7 months ago

Hi @KELSEYDOWLING7 I was thinking if you derived a variable as 'date of last module completed' among modules 1-4 that might be much easier for this analysis. You can disregard the distinction of 'some' modules completed for this analysis. That distinction is making this unnecessarily confusing.

For the new Table 1, we just need table of all sites the and number of participants who are included this analysis. Nothing about completion of surveys or attrition.

Thanks

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj Ok thank you, yes I agree the 'Some' is confusing. But if that's the case, I wouldn't be creating this table as a way to determine how many modules were completed at each level of reminders. It would be showing where people completed the modules. For example if they completed the modules after the third email but before the fourth email they would only be counted in line 15, not in any previous lines regarding the first and second reminders. Is that ok?

brotzmanmj commented 7 months ago

@KELSEYDOWLING7 Yes, that's perfect!

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj Ok great. Can you clarify what the difference between 'Eligible for Reminder Number' and 'Sent Reminder Number' are? To me those would be the same, it would just be whether or not they received that contact.

brotzmanmj commented 7 months ago

@KELSEYDOWLING7 Eligible for Reminder are the people who haven't completed the survey yet at the time that reminder is due. 'Sent the reminder' I'd like to get that number from the Notifications Table where we know who was sent the message.

brotzmanmj commented 7 months ago

In that sense, Eligible for Reminder is a subtraction, say for example, they were sent Contact 2 and they didn't complete their survey before Contact 3. That's how many people were eligible for contact 3. The notification table will tell us if they were actually sent Contact 3.

KELSEYDOWLING7 commented 7 months ago

(Apologize if you get two notifications for this- it looks like my original comment from a few hours ago was deleted) @brotzmanmj Thank you! I believe I have everything here. Please let me know how this looks before I stratify by Site

Baseline-Module-Completion-vs-Emails.pdf

brotzmanmj commented 7 months ago

Hi @KELSEYDOWLING7 we are getting closer! There are a couple of things we'll need to investigate before stratifying.

The gap between Eligible for Contact 2 and Sent Contact 2 is huge, way bigger than I would have expected. Only 59% of people who should have gotten Contact 2 were actually sent it, is that right? Can you double check that? I recall for a period of time, there were two versions of the survey reminders and I'm wondering if both of them were captured here. I'm not sure what else would account for a gap so big.

The other issue is I need to rethink the calculation for who is eligible for the next message. I know I put in the specs to calculate it as Sent - Survey Completed in each instance. But I think that's leaving out too many people given the gap mentioned above. Let's investigate the gap and then rethink this assumption.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj Good morning. Yes, based on the logic above that's correct. I have 'eligible for contact 2' as equal to the total population of the report minus those that completed all BL Modules before either before contact 1 or before contact 2, which would be the 10,953-(6,396+608)= 3,949. The 'Sent Contact 2' is anyone that has a reminder type (category) like "BL Survey Reminder". with a maximum attempt number of 2.

I've attached a csv with the list of the 48 different categories in the reminder table. Please let me know if any that don't have 'BL Survey Reminder' in them are in fact baseline survey reminders. I've highlighted the ones that are counted. Category types.csv

The only other thing I could think of it the timing factor, meaning there may not have been enough days passed verification for them to be eligible just yet for the reminder. I found some code regarding this on email reminder report from last year regarding this. Should this still apply?

Welcome to connect: Date of Verification + 0 days BL Reminder 1: Date of Verification + 4 days BL Reminder 1: Date of Verification + 5 days BL Reminder 1: Date of Verification + 16 days BL Reminder 1: Date of Verification + 25 days BL Reminder 1: Date of Verification + 40 days BL Reminder 1: Date of Verification + 87 days

brotzmanmj commented 7 months ago

Hi @KELSEYDOWLING7 I think it's an issue related to identifying the contact attempts on the CSV file. If you have a few minutes I can hop on a call and look at it with you.

The timing factor should not be an issue because we cut off the time period for inclusion in this analysis to October 2023 so everyone would have passed the dates for the reminders by now. But let's look at the CSV file first and then we can explore that more if needed.

brotzmanmj commented 7 months ago

@KELSEYDOWLING7 I have an idea that would make this easier. In addition to Category, can you pull the variable 'Contact Attempt' into the CSV file and resend? I believe we need to use both of those in combination to identify the contact attempt.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj Sorry if I wasn't clear. The Sent variable is already using both the category type and attempt number.

Are there other categories outside of the 17 highlighted that need to be included? I've confirmed my code searches for all 17.

brotzmanmj commented 7 months ago

Hi @KELSEYDOWLING7 I don't see any highlighted and the only column is I see in the CSV file is 'category'. Can you resend?

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj I apologize! Yes, here is the corrected csv. There is only supposed to be one column, the category column. The 17 highlighted rows are the BL reminder emails that I'm including. The non-highlighted are not included and to me don't look like BL reminder emails, but wanted to double check

Category types.csv

KELSEYDOWLING7 commented 7 months ago

For whatever reason when I upload it here the highlight is not saving. But the included categories are the first 17 listed (rows 2-18)

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj The 'Sent Contact (number)' is also the maximum number of contacts they have received. That may account for it as well. I will adjust that code and resend the new table shortly

brotzmanmj commented 7 months ago

@KELSEYDOWLING7 Rows 2, 3, and 18 don't specify what contact attempt they are in the CSV file, can you tell me how you have those assigned?

The 'Sent Contact' should be those people who were sent the contact attempt in question, as opposed to the maximum number they have received. Agree this might be accounting for the issue.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj I'm sorry my coffee obviously hadn't kicked in yesterday. My code for 'Sent Contact 2' is anyone that had 2 OR MORE contact attempts for BL Survey Reminders. So that's correct.

Do we have any other thoughts on what "Eligible' should be calculated as?

brotzmanmj commented 7 months ago

Hi @KELSEYDOWLING7 the code should be those who were sent contact 2. And there should be two messages for contact 2 (one for backlog and one for non backlog). Not 2 or more. Can you identify those in the Notifications?

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj I think I'm getting confused at how to properly identify those who have had 0 BL Survey Reminder Emails, or why the number of reminders sent don't seem to follow a proper pattern.

I currently have the logic that: attempt=="1st contact" & category contains "BL Survey Reminder" -> 1st contact attempt=="2nd contact" & category contains "BL Survey Reminder" -> 2nd contact attempt=="3rd contact" & category contains "BL Survey Reminder" -> 3rd contact attempt=="4th contact" & category contains "BL Survey Reminder" -> 4th contact attempt=="5th contact" & category contains "BL Survey Reminder" -> 5th contact attempt=="6th contact" & category contains "BL Survey Reminder" -> 6th contact (exclude additional contacts) ELSE = 0 contacts

That produces: 7,646 that have received 0 contacts (but only 6,198 have their BL surveys completed) 924 have had 1 contact 901 have had 2 contacts 191 have had 3 contacts 92 have had 4 contacts 323 have had 5 contacts 876 have had 6 contacts

Is there any reason people would have more 5th and 6th contacts? I would think those numbers should have continued to decrease

brotzmanmj commented 7 months ago

@KELSEYDOWLING can you send a frequency table of the 'N' for each of the categories of reminders in the Notification table? And include the Attempt and Category in the output?

brotzmanmj commented 7 months ago

Something is not right because everyone who is verified should be sent Contact 1 (there should no verified participants with 0 contact attempts for BL survey)

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj 0 contacts would mean only the welcome to connect email. If they finished all surveys before being verified, they wouldn't receive a reminder to complete the surveys. Unless I'm misunderstanding the timing of those email reminders

brotzmanmj commented 7 months ago

Hi, let's discuss on a call.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj Good morning, these numbers look a lot better after out conversation yesterday.

Baseline-Module-Completion-vs-Emails.pdf

Unfortunately while I confirmed that this group of participants did not have backlog contacts, there's something off with the contacts themselves. Either the categories aren't as clear as they should be or participants received double contacts of the same attempt.

There were 11,607 1st attempt BL Survey Reminder emails, but only 11,070 are to unique Connect IDs. There were 7,074 2nd attempt BL Survey Reminder emails, but only 7,053 are to unique Connect IDs. There were 6,237 2nd attempt BL Survey Reminder emails, but only 6,229 are to unique Connect IDs.

These are the categories of these participants in BQ. There are categories with specified attempts (ex: Verified, BL Survey Reminders New Pts 2nd contact ), and then a general category of BL survey reminders (Verified, BL Survey Reminders). Every single participant received a contact from the "Verified, BL Survey Reminders" category, while only 7,059 received a contact from a "Verified, BL Survey Reminders New" type category so I don't want to only pull data from that type of category. I also confirmed that the "Verified, BL Survey Reminders" category includes attempts 2-9 in GCP.

image

brotzmanmj commented 7 months ago

Hi, ok we are getting closer yet. A couple of things to check... @KELSEYDOWLING7 can you give us two or three Connect IDs (or tokens) of a participant that you believe may have gotten more than one of the same contact attempt? Say for the 2nd contact attempt, as an example? We can look them up in the SMDB and see what messages we think the system sent them.

I think every single participant received a contact from the "Verified, BL Survey Reminders" category because Contact Attempt 1 (the welcome to Connect + first survey reminder email) has always come from and continues to come from that category. Then, I think at some point we stopped using that category for messages 2-9 and switched to category "Verified, BL Survey Reminders New". If (and we'd need you to look at the Notification table data to confirm this) that switch over happened before our inclusion period for this analysis (Jan 1, 2023) then you are correct, we should use Contact 1 from "Verified, BL Survey Reminders" and contacts 2-9 only from "Verified, BL Survey Reminders New", because they would not have received 2-9 from the original "Verified, BL Survey Reminders" category. @depietrodeanna does this match your understanding?

Also @KELSEYDOWLING7 please remove the exclusion of "Those who have refused baseline surveys or who have withdrawn from the cohort are excluded". We should not exclude them from this analysis.

@KELSEYDOWLING7 There are 5 people who never got Contact 1? Can you send the Connect IDs (or tokens) of those 5 people and we will investigate.

depietrodeanna commented 7 months ago

@KELSEYDOWLING7 We split these reminders into three distinct groups: "Verified BL Survey Reminders" (first contact, welcome to Connect), "Verified BL Survey Reminders Backlog Pts" (contacts 2-9 for people verified before date all modules were loaded into PWA; only contacts 2-6 remain), and "Verified BL Survey Reminders New Pts" (contacts 2-9 were originally loaded, only contacts 2-6 remain. contacts 7-9 deleted to leave additional room for reminding via diff modalities). So @brotzmanmj is correct; everyone should have gotten the first contact from Verified BL Survey Reminders, and for the purposes of these analyses, you should only include Verified BL Survey Reminders New Pts contacts 2-6. Reminders 2-6 each have their own category as you saw (...New Pts 2nd contact, ...New Pts 3rd contact, etc. due to some historical issues with the way the SMDB handled categories).

As a side note, I think we should remove all the backlog pt reminder contacts 2-6 from prod since they are now defunct.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj I apologize for the confusion. Opt outs, withdrawals or refusals are already included back into the report, I had just forgotten to remove it from the first footnote. And I can confirm we have no backup contacts for this group.

It appears as though Connect ID 1213472191 received 77 first attempts, 1587864414 received 2 first attempts, and 1890083917 received 75 first attempts. Those are all from the category "Verified, BL Survey Reminders"

For the second contact, category "Verified, BL Survey Reminders New Pts 2nd contact", Connect ID 1213472191 received 3 second attempts, 1890083917 received 3 second attempts and 2698139899 received 2 second attempts.

The 5 individuals who never received contact 1 are: 9205085369 - only received contacts 4-6 5877351701  - only received contact 6 5576621091 - only received contact 6 8223596151 - only received contact 6 2679238707 - only received contact 6

brotzmanmj commented 7 months ago

Hi @KELSEYDOWLING7 well I was initially alarmed at the 77 attempts and the 75 attempts, but I checked the SMDB and I see only one first contact attempt of "Verified, BL Survey Reminders" sent to each of those participants 1213472191 and 1890083917 as well as one contact attempt for 1587864414. Same with the second contact attempts you listed... 1 only see one contact attempt each.

I also spot checked one person 5877351701 on the list of those who never received contact 1 and I see in the SMDB it looks like it was sent.

I'm not sure how to get to the bottom of this. What do you suggest?

Also, for the many emails you mentioned that look fake (have an email.com ending) can you tell me if those are all for UChicago participants?

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj Interesting. Well that's good that its not a bigger contact issue, but in that case the BQ data isn't reliable for correct analysis...

And yes, they are all UC participants. They aren't excluded in this report

brotzmanmj commented 7 months ago

OK, that's good to know about the UC participants. About how many of those email.com addresses did you say there were?

For the reliability of the BQ data, can you meet with Warren to explore why this appears to the case in BQ? Can he compare to what he sees in Firestore?

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj It looks like there are only 18 unique participant Connect IDs that are set to that.

And will do. Does the SMDB come directly from Firestore?

brotzmanmj commented 7 months ago

@KELSEYDOWLING7 That's a good question. I think it comes from BQ but Warren will know for sure.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj Based on the conversation with Warren I reran the report merging on token and see much better data. It does correctly show that all participants have received a first attempt.

Baseline-Module-Completion-vs-Emails.pdf

Only one participant received more than one attempt; CID 9868324677 received two 1st contacts for 'Verified, BL Survey Reminders'.

There is a small percentage that seems to have received a reminder despite having completed all modules . If you recall, we decided to round up the days the attempts had been sent the 'Sent after Contact number and before Contact number". I think that is where this small percentage of overlap is coming from.

KELSEYDOWLING7 commented 7 months ago

Sorry, small typo in the coding: Baseline-Module-Completion-vs-Emails.pdf

brotzmanmj commented 6 months ago

HI @KELSEYDOWLING7 this looks good! I think we are getting the accurate information. Here are a few notes to add to the report and then please post it on Box on Connect_ALL in this folder: https://nih.app.box.com/folder/239920058353 At some point in the future we may want to expand the date range and re-run it so please hold onto this code in a place where it can be revisited. Please name the file Baseline Survey Response by Email Attempts_Jan23-Oct23

On the first page of the report, please change the report title to 'Baseline Survey Response Analysis by Email Attempts for Time Period January to October 2023'

Notes to add:

KELSEYDOWLING7 commented 6 months ago

@brotzmanmj What tables should those last two footnotes go to?

brotzmanmj commented 6 months ago

@KELSEYDOWLING7 let's put those on Table 2

KELSEYDOWLING7 commented 6 months ago

@brotzmanmj Baseline-Module-Completion-vs-Emails.pdf

Here you go! The report pdf name will be edited on Box to 'Baseline Survey Response by Email Attempts_Jan23-Oct23' once I upload it