This PR does a few different things. Yes, it's a little overloaded. Whatever!
Refactor
It starts with two refactor commits. The first, _Refactor SQLite out of get_orphanedprojects and its tests, was to switch the testing DB to Duck DB so we can refactor the date hacks out of get_orphaned_projects.
I did that so I could add dbplyr-centered performance enhancements to get_orphaned_projects in _Add speed improvements to get_orphanedprojects It nows runs against prod in about 2 seconds instead of 30+. The enhancements were multifold. Adding filters to project_ownership before joining it was a huge speed boost. The second thing I did was to combine filter statements wherever possible. In some cases this provides very large gains, you will see this appear more often if I need a boost to dbplyr speed.
FYI, I tested the shit out of the refactored code. not only does it pass the units tests, I iterated both the old and new code across 5 months of Prod data and got a perfect match of the output. The testing harness was interesting. Thsi is what I did:
# steal preamble from sequester_orphans
library(rcc.billing)
library(redcapcustodian)
library(RMariaDB)
library(DBI)
library(tidyverse)
library(lubridate)
library(dotenv)
library(sendmailR)
library(tableHTML)
init_etl("my_awesome_test")
rc_conn <- connect_to_redcap_db()
rcc_billing_conn <- connect_to_rcc_billing_db()
# Wrap a copy of the new code (not included here) with a function that reorders the parameter list for Purrr
# and appends the months, previous so I can tell which call (month) found what problems
get_orphaned_projects_for_iteration_new <- function(months_previous, rc_conn, rcc_billing_conn) {
get_orphaned_projects_new(
rc_conn = rc_conn,
rcc_billing_conn = rcc_billing_conn,
months_previous = months_previous) %>%
mutate(months_previous = months_previous)
}
# Do as above but for the old code (also not included here)
get_orphaned_projects_for_iteration_old <- function(months_previous, rc_conn, rcc_billing_conn) {
get_orphaned_projects_old(
rc_conn = rc_conn,
rcc_billing_conn = rcc_billing_conn,
months_previous = months_previous) %>%
mutate(months_previous = months_previous)
}
# make a vector of months to iterate over
months_previous <- c(
0,1,2,3,4
)
# Run the new and the old cold over the same vector of months
new <- purrr::map_dfr(months_previous, get_orphaned_projects_for_iteration_new, rc_conn, rcc_billing_conn)
old <- purrr::map_dfr(months_previous, get_orphaned_projects_for_iteration_old, rc_conn, rcc_billing_conn)
# verify the awesome
all_equal(new, old)
# Be happy :-)
Feature, not bug
I worked my tail off to get the speed improvements above so I could chase what I expected to be a nasty bug. It turns out the the bug was OK code not communicating things clearly to Taryn or me. it confused us both. Thus I created Add reason to project sequestration messages. It does these things:
Add reason to sequester_project params.
Rename project_ids param on sequester_project.
Update references to sequester_project params.
Update sequester_project for modern tidyselect syntax.
It also does this one completely unrelated thing that Sandra reminded me of just as I was committing this code:
Move to new CTSI-SvcBillingTeam@ad.ufl.edu address in sequester_unpaid_projects.
This PR does a few different things. Yes, it's a little overloaded. Whatever!
Refactor
It starts with two refactor commits. The first, _Refactor SQLite out of get_orphanedprojects and its tests, was to switch the testing DB to Duck DB so we can refactor the date hacks out of get_orphaned_projects.
I did that so I could add dbplyr-centered performance enhancements to get_orphaned_projects in _Add speed improvements to get_orphanedprojects It nows runs against prod in about 2 seconds instead of 30+. The enhancements were multifold. Adding filters to
project_ownership
before joining it was a huge speed boost. The second thing I did was to combine filter statements wherever possible. In some cases this provides very large gains, you will see this appear more often if I need a boost to dbplyr speed.FYI, I tested the shit out of the refactored code. not only does it pass the units tests, I iterated both the old and new code across 5 months of Prod data and got a perfect match of the output. The testing harness was interesting. Thsi is what I did:
Feature, not bug
I worked my tail off to get the speed improvements above so I could chase what I expected to be a nasty bug. It turns out the the bug was OK code not communicating things clearly to Taryn or me. it confused us both. Thus I created Add reason to project sequestration messages. It does these things:
It also does this one completely unrelated thing that Sandra reminded me of just as I was committing this code: