alexsanjoseph / compareDF

R Tool to compare two data.frames
Other
93 stars 17 forks source link

compare_df fails if the first parameter, `df_new`, is already a data.table #39

Closed jabortell closed 3 years ago

jabortell commented 3 years ago

If df_new is already a data.table then compare_df fails:

library(compareDF)

data("results_2010", "results_2011")

# If `df_old` is already a data.table, there is no error.
results_2010_dt = data.table::as.data.table(results_2010)
compare_df(results_2011, results_2010_dt, "Student")

# When `df_new` is already a data.table, `compare_df` fails
results_2011_dt = data.table::as.data.table(results_2011)
compare_df(results_2011_dt, results_2010, "Student")
Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "character"

16: mutate_(.data, .dots = compat_as_lazy_dots(...))
15: mutate.default(., newold_type = "new")
14: mutate(., newold_type = "new")
13: function_list[[k]](value)
12: withVisible(function_list[[k]](value))
11: freduce(value, `_function_list`)
10: `_fseq`(`_lhs`)
9: eval(quote(`_fseq`(`_lhs`)), env, env)
8: eval(quote(`_fseq`(`_lhs`)), env, env)
7: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
6: both_tables$df_new %>% mutate(newold_type = "new")
5: rbind(deparse.level, ...)
4: rbind(both_tables$df_old %>% mutate(newold_type = "old"), both_tables$df_new %>% 
       mutate(newold_type = "new"))
3: as.data.table(rbind(both_tables$df_old %>% mutate(newold_type = "old"), 
       both_tables$df_new %>% mutate(newold_type = "new")), key = group_col)
2: combined_rowdiffs_v2(both_tables, group_col)
1: compare_df(results_2011_dt, results_2010, "Student")

The issue appears to be line fnsCompare.R:65

both_tables$df_new = both_tables$df_new[, names(both_tables$df_old)]

If df_new is a data.table, then both_tables$df_new becomes the character vector names(both_tables$df_old), rather than a table with the same columns and data as df_old.

jabortell commented 3 years ago

I found that simply converting df_new and df_old to data.tables at the top of compare_df like so

data.table::setDT(df_old)
data.table::setDT(df_new)

and changing the line in my comment above to

both_tables$df_new = both_tables$df_new[, .SD, .SDcols = names(both_tables$df_old)]

was sufficient. I reran all tests, and they all succeeded. Note that both_tables$df_new[, names(both_tables$df_old), with = FALSE] is equivalent, but I think the intention is clearer through the .SD syntax.

If you'd like that I would submit a pull request I could do that.

alexsanjoseph commented 3 years ago

@jabortell - Sounds great - Please submit a PR, I'll take a look and merge. Thanks for contributing :)

jabortell commented 3 years ago

Done. See PR #40

alexsanjoseph commented 3 years ago

Merged and closing