add get_ctsi_study_id_to_project_id_map

ljwoodley commented 7 months ago

Closes issue #208

ChemiKyle commented 7 months ago

Some of your results don't match those from Philip's example.

p1414_token <- "your_token_here"

service_requests <- redcap_read(
  redcap_uri = Sys.getenv("URI"),
  token = p1414_token
)$data
service_requests <- service_request

rcc_billing_conn <- connect_to_rcc_billing_db()
rc_billing_conn <- rcc_billing_conn

result_l <- get_ctsi_study_id_to_project_id_map(service_requests, rcc_billing_conn)

# Read invoice line item records
extant_invoice_line_items <- tbl(rc_billing_conn, "invoice_line_item") |>
  collect()

# Get unique, modern CTSI Study IDs for each REDCap Project
# Get them from extant_invoice_line_items. We need both the annual project
# billing line items and the service_request  line items. 
# We have to join the latter to the service request history to map
# service_request line items to the PIDs they relate to. 
result_p <-
  bind_rows(
    extant_invoice_line_items |>
      filter(service_type_code == 1 & !is.na(ctsi_study_id)) |>
      arrange(desc(id)) |>
      select(id, service_type_code, service_identifier, ctsi_study_id) |>
      rename(project_id = service_identifier),
    extant_invoice_line_items |>
      filter(service_type_code == 2 & !is.na(ctsi_study_id)) |>
      arrange(desc(id)) |>
      select(id, service_type_code, service_identifier, ctsi_study_id) |>
    inner_join(service_requests |> select(record_id, project_id) |>
               mutate(project_id = as.character(project_id)) |>
               mutate(record_id = as.character(record_id)),
        by = c("service_identifier" = "record_id")
      ) |>
      select(service_type_code, project_id, ctsi_study_id)
  ) |>
  arrange(desc(id)) |>
  distinct(project_id, ctsi_study_id)

result_l
result_p

foo <- inner_join(result_l, result_p, by = "project_id")

foo |>
  filter(ctsi_study_id.x != ctsi_study_id.y)
# 32 rows

ljwoodley commented 7 months ago

That's because those project ids map top multiple study ids.

result_p |> 
  add_count(project_id) |> 
  filter(n > 1) |> 
  arrange(project_id)

Screenshot 2024-04-15 at 7 58 31 PM

How should the duplicated project_ids be handled @pbchase?

pbchase commented 7 months ago

That's because those project ids map top multiple study ids. ... How should the duplicated project_ids be handled @pbchase?

Ah, now I understand. I have seen this before. The executive decision I made last time was to use the most modern ctsi_study_id for the project ID. I base this on my assumption that maybe it's more right the second time

ljwoodley commented 7 months ago

ctsi study ids also map to multiple project ids. When this occurs the max project id is kept.

pbchase commented 7 months ago

ctsi study ids also map to multiple project ids. When this occurs the max project id is kept.

No, you should keep all of the project IDs that map to a single CTSI_Study_ID.

pbchase commented 7 months ago

I love this enormous function, but I have one issue--there is no test. I spec'd it so it could be testable, but there is no test.

I'd like you to write the test. I have been writing the test for these functions but I need to pass the torch. To that end, I documented how I do it. It's a bit involved, but I find it liberating. I hope I did a decent job of documenting it. I hope you like the method

Please read Unit tests with testthat, try to script that makes the test data from the real data, and try to write one test. Feel free to ask questions. I haven't had a lot of time to polish this. You are the first tester

ctsit / rcc.billing

add get_ctsi_study_id_to_project_id_map #212