After diving deep into the issue, some conclusions were drawn from the data:
A youth (determined by a personal_id and organization_id) could have several enrollment_id values in dm$enrollment.
The filters applied in mod_filters.R were not taking into account the enrollment_id value.
In general, the different pages in the app used all enrollments from filtered "clients" (a.k.a. youth). This way, questions that were made at the enrollment level (such as "Sexual Orientation") would result in "inconsistencies" with what a user might expect (e.g. seeing that we have a total of 6 youth in the chart of "# of Youth by Gender" while a total of 10 youth in the chart of "# of Youth by Sexual Orientation" on the same page).
The changes applied in this PR improve the way data is filtered in the app. Our new goal is to end the "filter stage" with a set of personal_id, organization_id and enrollment_id that correspond to the "most recent" enrollment for a youth. See Details for a comprehensive definition of what we understand as the "most recent" enrollment for a youth. Additional changes to improve mod_filters.R were implemented, such as a refactor of input population when the app is launched.
We expect that this new approach to filtering data will simplify data processing in the different pages of the app.
List of Changes
Read enrollment_id column from each table in the database.
Refactor input population when the app is launched (unnecesary observers were removed)
Change project select input (now the project_id is sent to the server)
Improve filter module by changing filtering logic
Add enrollment_id column as a key in inner joins
Details
The new process for filtering data follows these steps:
Start from enrollment data, which has one row per "person-organization" per enrollment.
Keep enrollments that correspond to selected projects.
Remove enrollments with an entry date after the later active date.
Remove enrollments that correspond to "person-organization"s who exited before the first active date (a "person-organization" who didn't exit is not removed).
If a "person-organization" has multiple enrollments, select one enrollment per "person-organization" following these rules:
Select the enrollment without an exit date (or the one with the most recent exit date if all enrollments have an exit date)
If the "person-organization" still has multiple enrollments, select the enrollment with the most recent entry date (i.e. use the latest entry date)
If the "person-organization" still has multiple enrollments, select the enrollment with the most recent date updated (i.e. use the latest date updated)
If the "person-organization" still has multiple enrollments, select the enrollment that has the highest enrollment_id
If "Limit to Head of Household" is checked, keep enrollments where "Relationship to Head of Household" is "Self (head of household)".
Keep enrollments that correspond to "person-organization"s with an age within the selected range ("person-organization"s with a missing age are kept when "Include Youth with Missing Ages" is checked).
Keep enrollments that correspond to "person-organization"s that have a selected gender.
Keep enrollments that correspond to "person-organization"s that have a selected ethnicity.
If "De-duplicate Youth Across Projects by SSN" is checked:
Keep enrollments that correspond to "person-organization"s with "Full SSN reported".
Remove enrollments that correspond to "person-organization"s that have a missing SSN. NOTE: There are instances in the data where "person-organization"s with "Full SSN reported" show a missing SSN.
Select one enrollment per snn following these rules (same as above):
Select the enrollment without an exit date (or the one with the most recent exit date if all enrollments have an exit date)
If the "person-organization" still has multiple enrollments, select the enrollment with the most recent entry date (i.e. use the latest entry date)
If the "person-organization" still has multiple enrollments, select the enrollment with the most recent date updated (i.e. use the latest date updated)
If the "person-organization" still has multiple enrollments, select the enrollment that has the highest enrollment_id
Overview
After diving deep into the issue, some conclusions were drawn from the data:
personal_id
andorganization_id
) could have severalenrollment_id
values indm$enrollment
.mod_filters.R
were not taking into account theenrollment_id
value.The changes applied in this PR improve the way data is filtered in the app. Our new goal is to end the "filter stage" with a set of
personal_id
,organization_id
andenrollment_id
that correspond to the "most recent" enrollment for a youth. See Details for a comprehensive definition of what we understand as the "most recent" enrollment for a youth. Additional changes to improvemod_filters.R
were implemented, such as a refactor of input population when the app is launched.We expect that this new approach to filtering data will simplify data processing in the different pages of the app.
List of Changes
enrollment_id
column from each table in the database.project_id
is sent to the server)enrollment_id
column as a key in inner joinsDetails
The new process for filtering data follows these steps: