department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 204 forks source link

Spike: Investigate Participant_ID return rates #95067

Closed matt4su closed 3 weeks ago

matt4su commented 4 weeks ago

Issue Description

Research the cause of lower than expected return rates for Participant_ID from user object data. Context: There are multiple ways to look for participant_id. We're now looking at the user object and it does not appear to be returning the value as frequently as we would expect. For example when we pass in additional user property values like Birthday and SSN more values are retuned then when only passing in ICN.

Related ticket: #93996 Time box to 3 effort


Tasks

Acceptance Criteria

TaiWilkin commented 3 weeks ago

Summary of results

Definitions

Investigation

ITF, InProgressForms, and SavedClaims

Image

I collected InProgressForms begun between 9/15/24 and 10/15/24, SavedClaims where the form_start_date was between the same, and ITF metrics for the same dates. I ran both the MPI search by attribute and MPI search by ICN for each of the InProgressForms and SavedClaims with a user_account_id (representing a logged-in user).

InProgressForms

Participant IDs were returned at a similar rate from IPFs using MPI search by ICN as were found in ITF. This suggests that the User model is correctly returning the ICN and Participant ID when they are available in MPI via ICN search (with slight variation due to difference in time of search and some claims having been submitted).

The MPI search by attribute unexpectedly returned 0 results.

SavedClaims

In SavedClaims, Participant IDs were returned at a high rate for MPI search by ICN, and more of the UserAccounts had an ICN overall, when compared to InProgressForms and ITFs. This is likely explained by the fact that participant ids are created for users if they don't exist when they submit claims.

The highest rate of ICN and ParticipantId came from SavedClaims when searching by attribute. In fact, over 95% of claims had a Participant Id available via attribute search.

Differences between InProgressForms and SavedClaims

I hypothesize that the attribute search is very exact, and only returns ICN and Participant ID if a profile exists with that exact data. If the profile is created or updated when a claim is submitted (resulting in an exact match of the claim data being found in MPI), this would then return an ICN and Participant ID for any claim that has finished processing.

The fact that in some cases an ICN is available by attribute search and not by ICN search for the same saved claim indicates a possible issue in the matching of accounts or submission data, where a single user might have multiple ICNs or even multiple participant IDs. This is supported by the fact that, for the rows where both ICN and attribute search returned an ICN, the ICNs only matched 86.5% of the time; and for the rows where both ICN and attribute search returned a participant ID, the participant IDs only matched 84.4% of the time.

LOA and ICN

Reviewing the code, it appears that the User model is set to LOA3 if the UserAccount has an ICN, and LOA1 if not. Therefore, LOA doesn't impact whether an ICN or Participant_Id is returned.

Recommendations

Given that the MPI attributes search is ineffective when applied to an InProgressForm, it should not be used for ITF.

Given that the MPI ICN search is equally effective with the current ITF search, and that the current ITF search uses the @current_user model (which has built-in error handling and support), I recommend continuing to use the @current_user model for ICN and participant_id.

It might be worth reaching out to MPI to discuss the mismatch between ICNs given different methods of access, but should be low priority since it only applies for SavedClaims.