ctsit / rcc.billing

Automated, data-driven service billing implemented on REDCap Custodian
https://ctsit.github.io/rcc.billing/
Apache License 2.0
0 stars 3 forks source link

Create an ETL `write_uf_fiscal_orgs_to_person_org.R` #125

Closed pbchase closed 1 year ago

pbchase commented 2 years ago

Write the ETL write_uf_fiscal_orgs_to_person_org.R to populate a new rcc.billing table person_org

person_org design

These columns are typically non-blank in the UF person data:

 $ useridalias     : chr "pbc"
 $ uf_gl_state     : chr "A"
 $ begindttm       : chr "2022-11-17 11:25:03"
 $ enddatetime     : chr "2022-11-17 11:25:03"
 $ uf_lockout_dttm : chr "2022-11-17 11:25:03"
 $ lastupddttm     : chr "2022-11-17 11:25:03"
 $ ufid            : chr "15395270"
 $ uf_status_flg   : chr "A"
 $ uf_primary_affl : chr "T"
 $ uf_display_nm   : chr "Chase,Philip B"
 $ uf_work_title   : chr "Assistant Director"
 $ uf_gender_cd    : chr "M"
 $ uf_birth_dt     : chr "-001-11-30"
 $ uf_domesticstat : chr "N"
 $ uf_hm_p_publish : chr "N"
 $ uf_hm_p_ctry_cd : chr "1"
 $ uf_hm_p_area_cd : chr "352"
 $ uf_hm_p_phone   : chr "8717900"
 $ uf_hm_p_invalid : chr "N"
 $ uf_uf_p_publish : chr "N"
 $ uf_uf_p_ctry_cd : chr "1"
 $ uf_uf_p_area_cd : chr "352"
 $ uf_uf_p_phone   : chr "5750705"
 $ uf_uf_p_invalid : chr "N"
 $ uf_hm_a_publish : chr "N"
 $ uf_hm_a_line1   : chr "4430 NW 13TH AVE"
 $ uf_hm_a_city    : chr "GAINESVILLE"
 $ uf_hm_a_state_cd: chr "FL"
 $ uf_hm_a_zip_cd  : chr "32605"
 $ uf_hm_a_ctry_cd : chr "US"
 $ uf_hm_a_invalid : chr "N"
 $ uf_uf_a_publish : chr "N"
 $ uf_uf_a_line1   : chr "4430 NW 13TH AVE"
 $ uf_uf_a_city    : chr "GAINESVILLE"
 $ uf_uf_a_state_cd: chr "FL"
 $ uf_uf_a_zip_cd  : chr "32605"
 $ uf_uf_a_ctry_cd : chr "US"
 $ uf_uf_a_invalid : chr "N"
 $ uf_uf_e_publish : chr "N"
 $ uf_uf_e_invalid : chr "N"
 $ uf_pref_lang    : chr "ENGLISH"
 $ uf_security_flg : chr "N"
 $ uf_protect_flg  : chr "N"
 $ uf_publish_flg  : chr "N"
 $ uf_deceased_flg : chr "N"
 $ uf_updt_src_id  : chr "IAM Coord/Admin"
 $ uf_updt_ts      : chr "2019-06-19 16:25:50"
 $ uf_email        : chr "pbc@ufl.edu"
 $ user_id         : chr "pbc"
 $ email           : chr "pbc@ufl.edu"

Use these columns in person_org:

rcc.ctsit::get_uf_person_data_by_gatorlink("pbc") %>%
  janitor::clean_names() %>%
  select(
    user_id,
    ufid,
    email,
    uf_display_nm,
    uf_work_title,
    uf_phone_country_code = uf_uf_p_ctry_cd,
    uf_phone_area_code = uf_uf_p_area_cd,
    uf_phone = uf_uf_p_phone,
  )

Cohort of interest

We will need to fetch and store that data for every redcap user and every project PI email address that does not have a matching email address amongst the primary email addresses in redcap_user_information

Adding UF fiscal org data

Get org data from the staff_departments table in data.vivo like this:

vconn <- connect_to_vivo_db()

staff_departments <- dplyr::tbl(vconn, "staff_departments")
staff_departments %>%
  filter(ufid == 15395270) %>%
  select(
     ufid,
     primary_uf_fiscal_org = ps_deptid
  )

staff_departments returns results very quickly. It might allow all 10k-ish ufids in one query using UFIDs from the previous query.

Adding the Department-level ID

Sandra wants to see the department-level id in the data we send her. Whereas we should store primary_uf_fiscal_org, we should also walk up the hierarchy to get the dept-level ID. I believe that is the 2nd-level id. e.g. 29680240 becomes 29680000. It is unclear of the can simply substitute place the last four digits with zeros and use that. Whatever course you choose, add this additional dept_id onto the person record as primary_uf_fiscal_org_2nd_level

Frequency

Run this ETL weekly.

pbchase commented 2 years ago

FYI, the org data appears to be out-of-date. @pbchase is working with @senrabc to get this data source to refresh faster. I say this not to discourage you from doing this, but to say press on regardless. Don't stop coding to tell me the data is bad 'cause I know it is bad. We will fix it.

pbchase commented 1 year ago

There is some concern that we are reading the correct tables. See https://wiki.ctsi.ufl.edu/books/datavivoufledu/page/overview to know where the good data lives

pbchase commented 1 year ago

In summary, https://wiki.ctsi.ufl.edu/books/datavivoufledu/page/overview says "use VIVO_DB_NAME=researcher_index_mdm". It should also say "and don't use WH_PRIMARY_DEPTID. Instead use staff_departments"

pbchase commented 1 year ago

Addressed by PR #177