Closed jdhoffa closed 7 months ago
@jacobvjk requesting your review here for some input already
and cc @cjyetman and @AlexAxthelm for visibility
Allowing a named character vector, e.g. join_id = c(lei_direct_loantaker = "lei")
, might be an easy way to facilitate naming different columns in the loanbook and abcd datasets.
Yup, that's what I was thinking too!
I think may make sense to only allow a single join column for now. Adding multiple requires some join priority logic, and I'd rather not introduce that complexity yet. Could be a future feature once this one is in the wild, and proven to not be buggy
So there's a bunch of failing checks that I will look into later (I have a feeling it's because of changes in r2dii.data
messing up our vignettes etc.)
But the core functionality seems to be there :-)
prioritize()
This seems to function now as expected, with dplyr::*_join
-like syntax.
See reprex below:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(r2dii.data)
library(r2dii.match)
# 441 points to the company Russo, Russo e Russo Group
abcd_demo <- dplyr::mutate(
abcd_demo,
lei = dplyr::case_when(
company_id == "441" ~ "LEI123",
TRUE ~ lei
)
)
# C267 points to the company Russo s.r.l.
loanbook_demo <- dplyr::mutate(
loanbook_demo,
lei_direct_loantaker = dplyr::case_when(
id_direct_loantaker == "C267" ~ "LEI123",
TRUE ~ lei_direct_loantaker
)
)
out <- match_name(
loanbook_demo,
abcd_demo,
join_id = c(lei_direct_loantaker = "lei")
)
prioritized <- prioritize(out)
prioritized |>
filter(id_direct_loantaker == "C267") |>
select(id_direct_loantaker, lei_direct_loantaker, level, source, score, name, name_abcd)
#> # A tibble: 1 × 7
#> id_direct_loantaker lei_direct_loantaker level source score name name_abcd
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 C267 LEI123 lei_dir… id jo… 1 Russ… Russo, R…
Created on 2024-03-12 with reprex v2.1.0
@cjyetman I've tagged you as a reviewer, understanding you may not have enough context to get exactly what is going on here (that's what @jacobvjk is there for), but I would appreciate a second pair of eyes on this 😊
This ID column could be, for example,
lei
orisin
.Some open questions I have before marking this as ready for review:
loanbook
column name (e.g.lei_direct_loantaker
), and another column indicating theabcd
column name (e.g.lei
)loanbook
with multiple identical LEIs, and anabcd
with severalname_company
values that have the same LEI. Need to consider what to do in that caseSee reprex:
Created on 2024-03-07 with reprex v2.1.0
Closes #135