abtassociates / eva

Eva is a HUD application to aid HMIS Leads with data analysis. It is an open-source project intended for local use by HMIS Administrators in Continuums of Care (CoCs) around the U.S. and its territories.
GNU Affero General Public License v3.0
14 stars 4 forks source link

Duplicate Entries DQ / ProjectTimeID #521

Open trevinflick opened 2 months ago

trevinflick commented 2 months ago

Describe the bug We had an unusual amount of "Duplicate Entry" DQ issues with one of our projects. After some digging it appears that this project was HMIS participating, stopped, and started again. It looks like this is causing some duplication errors from the data prep.

From 04_initial_data_prep.R

quit_and_start_projects <- ProjectsInHMIS %>%
  get_dupes(ProjectID) %>% distinct(ProjectID)

if(nrow(quit_and_start_projects) > 0){
  QuitStarters <-  ProjectsInHMIS %>%
    filter(ProjectID %in% c(quit_and_start_projects)) %>%
    group_by(ProjectID) %>%
    arrange(OperatingStartDate) %>%
    mutate(ProjectTimeID = paste0(ProjectID, letters[row_number()])) %>%
    ungroup()

  ProjectsInHMIS <- ProjectsInHMIS %>%
    left_join(QuitStarters %>%
                select(ProjectID, ProjectTimeID, ParticipatingDateRange),
              by = c("ProjectID", "ParticipatingDateRange"))
}

It looks like the ProjectTimeID isn't getting created properly since quit_and_start_projects is a tibble and it's not being filtered correctly.

After I changed this line to filter(ProjectID %in% c(quit_and_start_projects$ProjectID)) the number of rows in QuitStarters went from 0 to 2 and the number of 'duplicate entry' errors went from ~1500 to 58.

trevinflick commented 2 months ago

I think changing this will also work, depending on what your preference is:

quit_and_start_projects <- ProjectsInHMIS %>%
  get_dupes(ProjectID) %>% distinct(ProjectID) %>% pull(ProjectID)
alyssandrichik commented 2 months ago

Hi @trevinflick,

Thank you for reaching out about this! The Eva Team investigated your bug and agreed that we need to fix this! We have added it to our to-do list and will update you when it is implemented. We hope to get to it soon.

Additionally, thanks for your help identifying where something was going wrong in the code and providing some suggestions on how to fix it! We really appreciate it :)

Best, Eva Team

kiadso commented 1 week ago

Hi Trevin, it appears we have rewritten how initial_data_prep.R works in another branch so this fix should be incorporated with a lot of other changes we have coming. I will come back to this tomorrow to figure out if we can get this change into dev before those other changes. Thank you so much for your patience!