Analyticsphere / metricsReportsRequests

Used to provide issue tracking for changes and additions to the Connect Metrics reporting.
MIT License
0 stars 0 forks source link

Analysis of Response rates for KP Biospecimen Email reminders #125

Closed brotzmanmj closed 7 months ago

brotzmanmj commented 8 months ago

@KELSEYDOWLING7 This will be an analysis of the effectiveness of the email reminders we sent for KP biospecimen collection.

Detailed requirements are here: https://nih.app.box.com/file/1458684328095

Please put the drafts and interim counts, etc, of what you produce as you go into that same folder. Happy to discuss before you start and answer any questions.

KELSEYDOWLING7 commented 8 months ago

@brotzmanmj A csv of all categories from the notifications table has been uploaded to the box folder. I also included a second csv for the four categories I believe are a match to this request. If so, I have those frequencies ready as well.

KELSEYDOWLING7 commented 8 months ago

Some issues I found in the data that will skew our tables:

  1. These 51 participants all completed their first collection (on either day 8 or day 9), but still received their 2nd attempt (on day 10 or on day 11): 1035320602 1219776812 1265592260 1376043410 1573248190 1667168280 1766148780 2042835618 2207168726 2696662861 3103165654 3318536221 3336979563 3428081361 3815424995 3997975390 4025211136 4107676766 4163761255 4233517525 4911931785 4999224736 5128644268 5561849615 5930682401 6356470809 6600360528 6640357017 6764735983 6767656688 7000527955 7113238256 7249632787 7295527627 7547871163 7607246903 7664799111 7723637500 7740470630 8079712307 8155052123 8557170208 8606755338 8653368538 8760257482 9116633260 9199309622 9360644642 9491526206 9547047843 9982271542

  2. I also found 483 participants that had multiple, if not all, KP Biospecimen Reminders sent on the same day, here are some examples: (These people had all 5 sent on the same day) 1035256151 7837339421 3122850205 8637189742 9546442997 5013361428 (These people had 2 sent on the same day) 9343797367 2152090344 3150640899

brotzmanmj commented 8 months ago

Hi @KELSEYDOWLING7 can you add the Contact Attempt field to the list of notifications on box?

brotzmanmj commented 8 months ago

And for the 51 and the 483 participants, can you tell me what days these were sent? Or put a csv file on box if that would be easier? I need to know if this was an isolated incident where they all got sent at one time or a reoccurring issue.

KELSEYDOWLING7 commented 8 months ago

I think a csv file would be easier to review. The 483 I've uploaded here https://nih.app.box.com/file/1463923069939 . Unfortunately it seems to have been a reoccurring issue. And then the the 51 are here https://nih.app.box.com/file/1463938628931

I also uploaded the Contact attempts field to the notifications list in that Box folder

brotzmanmj commented 8 months ago

Thanks Kelsey. I'll bring this up with the DevOps team. We need to get a handle on why this occurred and how to prevent it from occurring.

brotzmanmj commented 8 months ago

OK I sent an email about this. I think we have no choice however but to decide how to analyze the data we have. I see no way to proceed without excluding the 483 people. For the 51 people, we should include the people but exclude the message they received after their collection since it would have no possible impact on response. Does that make sense?

KELSEYDOWLING7 commented 8 months ago

@brotzmanmj Of course. And yes, that makes sense. Also I should clarify that the 51 are from this group alone BUT those 483 are over the entirety of the KP baseline biospecimen invitations, not just from this group.

KELSEYDOWLING7 commented 8 months ago

Warren's comments on the email regarding the 24-48 hour lag makes a lot of sense, and explains the data. We have an additional 29 participants that received reminder 3 after completion, 17 participants that received reminder 4 after completion and 14 participants that received reminder 5 after completion. I will exclude them too.

Similarly to the BL reminder emails report do you also want to exclude those who refused blood or urine, refused all future activities, withdrew, are deceased or requested suspended contact?

brotzmanmj commented 8 months ago

Hi, Sounds good about the messages (but not the people) from the analysis with the 24-48 hour lag.

For the BL reminder emails report, we didn't exclude those people did we? I think we just noted in a footnote that they exist and might account for the the reason not everyone who eligible for a message was actually sent a message. No, we wouldn't want to exclude them.

KELSEYDOWLING7 commented 8 months ago

Ah ok yes you're right I misread the footnote. Do you want similar footnotes in this report as well?

brotzmanmj commented 8 months ago

Yes that would be great. And I confirmed that the categories and contact attempts that you identified for this analysis are correct, so you can proceed with those.

KELSEYDOWLING7 commented 8 months ago

Great. I completed step 3 and confirmed that everything stays the same once the notification table is merged in. Of the 483 with the issue above, only 182 would have been in this report and are now excluded.

Please let me know if any footnotes need to be added or updated. KP-Biospecimen-Reminder-Analysis.pdf

brotzmanmj commented 8 months ago

This is great, thanks! Can you run Table 2 by KP site?

Couple of other questions... is there already incorporated a one-day shift for the before 3pm and after 3pm receipt of the draw order data?

Of the 7465, how many people (N, and %) still have no blood or urine collected as of yet? overall and by each KP site? (I can figure it out from calculation of Table 2 data but would be good to have it in a table).

For the people who gave samples after Contact 5, how long on average after contact 5 did they give samples?

KELSEYDOWLING7 commented 8 months ago

Oh, it did not include a 1 day shift. Are these notifications sent out 0,10,24,39,54 days after the draw order if they're verified after 3pm and delayed one day if they're verified after 3pm? I didn't think the verification time would affect these date ranges.

Of the 7565, 1999 (26.8%) have not had a blood or urine collection. Do you want that figure in a footnote somewhere? Or at the end of table 2?

On average, those that gave samples after contact 5 gave their samples 114.3206 days after contact 5

brotzmanmj commented 8 months ago

Hi @KELSEYDOWLING7 these aren't based on verification date/time, they are based on draw order placed date/time. But I suspect there could still be some shift here based on the 3pm message send. There is also another factor as play here... the lag between the date the order was placed and when the site pushed that date to us. The best thing to do would be to look at the data and see what it tells us. If you look at the date/time the draw order was placed and the date/time the first message was sent, and the date/time the second message was sent, it should give us a pattern of what's happening, similar to how we discovered the shift for the survey messages.

Of the 7565, 1999 (26.8%) have not had a blood or urine collection: at the end of Table 2 would be good

Is 114.3206 the median or the mean?

KELSEYDOWLING7 commented 8 months ago

Ok I'll do a little more digging into the data for any shifts.

Just as a preview, this is what the data/report looks like now. Please let me know if you'd prefer the n(%) of those who haven't donated yet to be in a footnote rather than the final KP-Biospecimen-Reminder-Analysis.pdf

I'm sorry I misread, of those that gave their sample after 5 contacts, the overall average/mean time between the draw order and the collection date was 114.6286 days. So that would be roughly 60 days after contact 5.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj I did notice that for many of those who had their blood/urine order set after 3pm, the notifications typically were delayed a day. However, there are also many that had a post 3pm blood order that did not have altered notification dates. And then there's those within the time windows Warren mentioned in his emails don't quite follow either pattern pattern. This makes it very hard to stratify and properly portray on a table.

I removed the 483 that had multiple or all notifications sent on the same day, applied the daylight savings time adjustment, removed those that received notifications after the collection was already completed, and stratified for those who had pre-3pm draw orders versus post 3-pm but the tables aren't quite clean. KP-Biospecimen-Reminder-Analysis.pdf

brotzmanmj commented 7 months ago

@KELSEYDOWLING7 The things about this that is difficult is that we aren't setting the date/time the order was placed. The date/time order was placed is the actual date and time the order was placed in the sites medical record system. Then they extract those data from their medical record system and move them to their study management system and then they push it to us via API. And then our system runs at 3pm each day, looks for any records meeting the criteria that have not been sent yet and sends them. And we don't retain data on when each draw order date was sent to us. So taking all this into consideration we are going to have live with some uncertainty here. Let's see if we have 15 minutes today to talk through the best way to handle?

KELSEYDOWLING7 commented 7 months ago

Finalized version with updated footnote, removal and post/pre 3pm stratification and removal of time change difference. Uploaded to box KP-Biospecimen-Reminder-Analysis.pdf

brotzmanmj commented 7 months ago

Hi @KELSEYDOWLING7 I was putting these numbers into the presentation and one issue... when I add up all of the samples collected at the various timepoints plus the number at the end of the table that says how many people have not given samples, it falls short of the total at the top of the tables (the number sent Contact Attempt 1). It seems like we're missing some people from the tables. Can you check?

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj As far as the total people not adding up, it seems to be that the difference between those in row one (the total over all or by site) and those that either gave their sample or did not give their sample is equal to those that were eligible but not sent a reminder in each table. This makes sense, because the SENT columns are a set amount from the notifications table, and the eligible= sent-(donated sample). So if there's some people that should have been sent an email but weren't, they're not counted in the eligible column. These people would be falling through the cracks. But that is part of the issue we'd have with this analysis because of the inconsistent notifications timing for when we expected participants to have been sent reminders vs when they actually got them. Please let me know if that makes sense or if we need to hop on a quick call so I can give an example with the tables

KELSEYDOWLING7 commented 7 months ago

We also know that because of the notification delays, the "gave sample after contact x and before contact x+1" counts are also not fully correct because they expect the notifications to be delivered within the time frames we preset. More people probably completed samples within each of those rows but because the notifications were delayed they're not captured. And then we removed the people from the SENT rows that completed their samples and THEN got notifications after completion, so that throws the counts off further

brotzmanmj commented 7 months ago

@KELSEYDOWLING7 I thought we dropped people from the analysis if they did not get their email reminders in the timeframe we expected?

The people that got an extra reminder after they donated their samples should be retained and they should be already correctly accounted for in the table.

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj We dropped the 483 people that got multiple reminders on the same day. Then for the 51 people that got extra reminders after they donated their samples, we removed the extra notification count from the sent rows.

However, I did find another problem. These four CIDs 1766922128 3613468175 3698144899 8417684909 have "Any clinical blood or urine sample received"( 56605577) set to yes but have null values for both urine collection time (139245758) and blood collection time (982213346). It does appear that Table 2 (overall) is off by 4 so that makes sense.

I also found an issue in the code that wasn't properly handling people that had only blood or only urine collected or draw ordered and I fixed that issue. The counts, outside of the 4, look correct on Table 2 (All Sites). I'll check the remaining tables now KP-Biospecimen-Reminder-Analysis.pdf

KELSEYDOWLING7 commented 7 months ago

One of those participants is from KPCO so that count is off by one. One of those participants is from KPNW so that count is off by one. None are from Hawaii, that count is correct. Two are KPGA so that count is off by 2.

brotzmanmj commented 7 months ago

Hi, I think we should drop the 4 from the analysis that have null values for the collection times, since we can't properly place them in a group

KELSEYDOWLING7 commented 7 months ago

@brotzmanmj The percentages at the end aren't perfect but the counts are correct:

Kaiser Permanente Colorado had 685 with no samples and 2004 with samples Kaiser Permanente Georgia had 254 with no samples and 697 with samples Kaiser Permanente Hawaii had 153 with no samples and 518 with samples Kaiser Permanente Northwest had 845 with no samples and 2301 with samples

KP-Biospecimen-Reminder-Analysis.pdf