Closed jdhoffa closed 4 years ago
Good point @jdhoffa,
The output has non-NA values towards the right of the tibble, but those values belong to new columns and all old columns coming from the input loanbook are full of NA
.
I'll identify where exactly the loanbook columns are joined back but I suspect the ultimate fix will need your help. I wrote code around your work, mostly without stopping to reflect if the overall process is the best way to achieve our goal -- and I suspect we are moving around and renaming columns more than strictly necessary.
I think one way to go about cleaning our mess is to meet live, fire the debugger and step into each function together, trying to explain to each other what we are doing and why. The goal would be not to fix stuff on the fly but to create lots of tiny actionable issues, assign them to one of us, then work on them independently.
What do you think?
suppressPackageStartupMessages(
library(dplyr)
)
library(r2dii.dataraw)
#> Loading required package: r2dii.utils
library(r2dii.match)
match_name(loanbook_demo, ald_demo) %>%
select_if(.predicate = ~ !all(is.na(.x)))
#> # A tibble: 1,350 x 10
#> id level sector sector_ald name name_ald alias alias_ald score source
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 UP23 ultima… automo… automotive Aston… aston m… asto… astonmar… 1 loanb…
#> 2 UP23 direct… automo… automotive <NA> aston m… asto… astonmar… 1 loanb…
#> 3 UP23 interm… automo… automotive <NA> aston m… asto… astonmar… 1 loanb…
#> 4 UP25 ultima… automo… automotive Avtoz… avtozaz avto… avtozaz 1 loanb…
#> 5 UP25 direct… automo… automotive <NA> avtozaz avto… avtozaz 1 loanb…
#> 6 UP25 interm… automo… automotive <NA> avtozaz avto… avtozaz 1 loanb…
#> 7 UP36 ultima… automo… automotive Bogdan bogdan bogd… bogdan 1 loanb…
#> 8 UP36 direct… automo… automotive <NA> bogdan bogd… bogdan 1 loanb…
#> 9 UP36 interm… automo… automotive <NA> bogdan bogd… bogdan 1 loanb…
#> 10 UP52 ultima… automo… automotive Ch Au… ch auto chau… chauto 1 loanb…
#> # … with 1,340 more rows
# Notice "!"
match_name(loanbook_demo, ald_demo) %>%
select_if(.predicate = ~ all(is.na(.x)))
#> # A tibble: 1,350 x 16
#> id_loan id_direct_loant… id_intermediate… id_ultimate_par… loan_size_outst…
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 <NA> <NA> <NA> <NA> NA
#> 2 <NA> <NA> <NA> <NA> NA
#> 3 <NA> <NA> <NA> <NA> NA
#> 4 <NA> <NA> <NA> <NA> NA
#> 5 <NA> <NA> <NA> <NA> NA
#> 6 <NA> <NA> <NA> <NA> NA
#> 7 <NA> <NA> <NA> <NA> NA
#> 8 <NA> <NA> <NA> <NA> NA
#> 9 <NA> <NA> <NA> <NA> NA
#> 10 <NA> <NA> <NA> <NA> NA
#> # … with 1,340 more rows, and 11 more variables:
#> # loan_size_outstanding_currency <chr>, loan_size_credit_limit <dbl>,
#> # loan_size_credit_limit_currency <chr>, sector_classification_system <chr>,
#> # sector_classification_input_type <chr>,
#> # sector_classification_direct_loantaker <dbl>, fi_type <chr>,
#> # flag_project_finance_loan <chr>, name_project <lgl>,
#> # lei_direct_loantaker <lgl>, isin_direct_loantaker <lgl>
Created on 2020-01-08 by the reprex package (v0.3.0.9001)
Yup, I'm happy to do this. I'll be working a little late today, so we could even do this today if you had time?
Adding to the wierdness ... this is one slice that exposes the bug:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(r2dii.dataraw)
#> Loading required package: r2dii.utils
library(r2dii.match)
slice(loanbook_demo, 4:5) %>%
match_name(ald_demo) %>%
select_if(~ all(is.na(.x)))
#> # A tibble: 6 x 16
#> id_loan id_direct_loant… id_intermediate… id_ultimate_par… loan_size_outst…
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 <NA> <NA> <NA> <NA> NA
#> 2 <NA> <NA> <NA> <NA> NA
#> 3 <NA> <NA> <NA> <NA> NA
#> 4 <NA> <NA> <NA> <NA> NA
#> 5 <NA> <NA> <NA> <NA> NA
#> 6 <NA> <NA> <NA> <NA> NA
#> # … with 11 more variables: loan_size_outstanding_currency <chr>,
#> # loan_size_credit_limit <dbl>, loan_size_credit_limit_currency <chr>,
#> # sector_classification_system <chr>, sector_classification_input_type <chr>,
#> # sector_classification_direct_loantaker <dbl>, fi_type <chr>,
#> # flag_project_finance_loan <chr>, name_project <lgl>,
#> # lei_direct_loantaker <lgl>, isin_direct_loantaker <lgl>
Created on 2020-01-08 by the reprex package (v0.3.0.9001)
Closed along with #89
@maurolepore just trying to get back into the flow of things after the break. Noticed that the high-level match_name() wrapper returns all NAs for me on a first (naive) run-through.
I know you're still rewriting the lower level functions, so maybe this function won't be ready that is finished, but just wanted to bring this to your attention in case you expected it to be working currently.
Created on 2020-01-08 by the reprex package (v0.3.0)