Analyticsphere / metricsReportsRequests

Used to provide issue tracking for changes and additions to the Connect Metrics reporting.
MIT License
0 stars 0 forks source link

PROMIS Backlog Metrics Report: Duplicate Accounts #165

Open HanaShiho opened 5 days ago

HanaShiho commented 5 days ago

We put together a list of metrics we need to further assess the duplicate accounts. Since the 1st PROMIS reminder was sent to backlogged pts on Thursday, August 29, we are interested in looking at the below metrics from August 29, 2024 forward. Additionally, we are only interested in the specific duplicate type, “Participant already enrolled.” Onto some actual metrics:

Login Method Usage:      Metric 1: Frequency of sign-in method on the original accounts      Metric 1.1: Frequency of sign-in method on the duplicate accounts      Metric 1.2: Using the crosswalk from each site, frequency of the within-person comparison of sign-in method on their original account as compared to their duplicate account. Example format, these rows in a table with N and % for each: • Original account = email sign-in method, duplicate account = email sign-in method • Original account = phone sign-in method, duplicate account = phone sign-in method • Original account = email, duplicate account = phone • Original account = phone, duplicate account = email etc. for the rest of the existing combinations

Rate of Duplicate Account Creation:      Metric 2: For each person on the crosswalk, calculate time between original and duplicate account sign-in Notes: Please provide as a distribution (min, max, median, mean, 25th and 75th percentiles). If we don’t have sign-in times for both accounts, use verification date/time for both accounts instead.      Metric 2.1: For each person on the crosswalk, calculate the time (in months) from the date of the last survey they completed (not including PROMIS) to the sign-in time of their duplicate account. Notes: If we don’t have complete/accurate sign-in times, then calculate from last survey submitted on their original account to the verification date/time of the duplicate account. Provide as a distribution (min, max, median, mean, 25th and 75th percentiles). This is a lower priority item.      Metric 2.2: For a comparison to Metric 2.1, the time distribution for all verified participants who were verified prior to Dec 1, 2023 and completed the PROMIS survey and didn’t create duplicate accounts. Notes: The distribution for this group would similarly be calculated the time of last survey submitted to time of PROMIS submission, excluding the people in the crosswalk. Provide as a distribution (min, max, median, mean, 25th and 75th percentiles).

Additionally, if a figure would help to display the metrics 2.1 and 2.2, here are some to consider and we welcome your thoughts: A plot of the verification date on the y-axis and the duplication date on the x-axis. Alternatively, we can plot each month and label it with the months of the year and # of months since verification. The third option is to plot the time difference on one axis and compare it to another variable if one stands out over the course of your analysis.

Survey Completion Rates:      Metric 3: Among the duplicate account creators, the number and percent who completed the PROMIS survey in their original account (despite creating a duplicate account).

HanaShiho commented 5 days ago

So far, HFH and HP have provided the list of crosswalks with relevant information. I compiled the list from these two sites into a spreadsheet. I'll update the spreadsheet as we get data from the remaining sites. You can find the spreadsheet here: https://nih.app.box.com/folder/285024251144

KELSEYDOWLING7 commented 4 days ago

@HanaShiho @brotzmanmj I have some concerns on the first metric few metrics.

I'm only seeing the variable RcrtSI_SignTime_v1r0 (335767902) for first sign in method, and each participant only has one value for RcrtSI_SignTime_v1r0. I'm not quite sure how or if I can find how many times they signed in. We also have the method they signed in RcrtPC_Acnt_SignIn_v1r0 (d_995036844), but again only one value per participant; I'm assuming it gets rewritten every time they log in, so I can't give any frequencies or information outside of the most current value we have for each account.

Metric 1 and Metric 1.1- I don't think I can do Metric 1.2- If I can't do 1 and 1.1, I also can't give you the frequency of sign in methods, and if they're overwritten I can only tell you the most recent sign in method for the original and duplicate accounts.

Please let me know if I am missing something

brotzmanmj commented 4 days ago

Hi @KELSEYDOWLING7 you're correct, so what we need is the comparison between their original (verified) account and their duplicate account.

brotzmanmj commented 4 days ago

For RcrtPC_Acnt_SignIn_v1r0 (d_995036844), I agree I think this gets rewritten every time they log in so that's fine, we don't need the frequency of sign in methods. Maybe there is some confusion about the requirements.

KELSEYDOWLING7 commented 4 days ago

Ah ok I understand now, coffee has not fully hit me yet. My mistake!

brotzmanmj commented 4 days ago

No worries, I feared this would be a confusing request! It was hard to figure out how to word it. Let us know if you have any more questions or if you want to hop on a call to clarify,

KELSEYDOWLING7 commented 4 days ago

Thanks! While we wait for the other sites to send their data, please let me know how the metrics look for HP and HF.

I can't come up with a short enough title for Table 2.2 to describe the conditionality, and the footnote doesn't seem to be populating. I'll keep working on that unless you have a table title in mind. I also chose for now to go with option 2 for the plot.

PROMIS-Backlog-Duplicates.pdf

brotzmanmj commented 2 days ago

Hi @KELSEYDOWLING7 this is really interesting, thanks.

For Tables 2, 2.1, 2.2, I'm wondering how it's possible that there are min values of 0 or negative months. Can you look into what is causing that? I'm wondering if we're not calculating these the way I think we are.

For the figure, I think this is not calculating what we thought... can you calculate the number of months from original account creation to duplicate account creation for each person and plot that?

KELSEYDOWLING7 commented 2 days ago

For Table 2.1, those with a negative time completed their last survey (not including PROMIS) on their original accounts, so the time of last survey completion is BEFORE the duplicate account creation.

Table 2.2 is in months, which I think is giving a different feel then the count in days would be, see below. Those with a min of 0 signed up and, of the modules they did complete, the last module was done the same day as sign in. image

I can create the plot the way you described, though please note that would be direct expansion of Table 2. image

brotzmanmj commented 2 days ago

Got it. So for Table 2.1, we presume all people would have completed their baseline surveys on their original accounts before the dup account was created. Is that not the case for everyone? Maybe we need to reverse the calculation: date/time of creation of duplicate account minus date/time of the last baseline survey submitted on their original account? And if someone did not submit any baseline survey on their original account then the value would be missing.

In Table 2.2, the last baseline module (so not including PROMIS)was completed in their original account on the same date that they signed in to the duplicate account?

KELSEYDOWLING7 commented 2 days ago

There are 6 participants that finished their last BL survey (in the original account) after the duplicate account was created. Then there are 22 that finished their most recent module before duplicate account creations, and then there are 5 participants that have done no BL modules. No did any BL module on their duplicate accounts. Because there's so few that completed their BL modules before duplicate account creation, we'll have mostly negative values if we flip it. image

Table 2.2 are those that do not have duplicate accounts. They completed modules the same day as their sign in (only one account).

brotzmanmj commented 2 days ago

Got it. Please add a table before Table 2.1 that gives the counts you mentioned. And then Table 2.1 should contain only the participants that finished their most recent baseline module before duplicate account creation (N=22) and the calculation should be reversed so they will all be positive values. Let me know if any questions. The title should be 'Time from Last Baseline Survey Completion to Duplicate Account Sign-In in Months'. (drop the word Difference in the title)

For Table 2.2, I think we don't have the right denominator, this metric not not be needed. Let's get Table 2.1 squared away first and then we'll revisit.

KELSEYDOWLING7 commented 2 days ago

I accidently counted the 2 that had completed their last BL survey on the same day as duplicate accounts as completed before. Note, the time difference in days are rounded to 1 decimal only. image

image

brotzmanmj commented 2 days ago

For Table 2.0, please change title to 'Timeframe of Most Recent Baseline Survey Module Completed relative to Duplicate Account Creation' and then on the rows it should be 'Most Recent Baseline Module Completed prior to Duplicate Account Creation; etc for each of those.

For Table 2.1 I feel like this isn't telling us much in this format. I'd like to try a categorical approach instead. Let me think about this til tomorrow